Ideas around anonymous enum types

Tom-Phinney · July 24, 2020, 4:42am

I suspect that this is problematic for generics, and in particular for nested generics, since then there's no reasonable reason to control any of ordering, nesting, or repetition.

Jon-Davis · July 24, 2020, 2:20pm

Can you explain why this is problematic? The reason I think it is ok, is because I don't assume the tuple (A, B, C) to be the same type as (A, (B, C)) so conversely the anon-enum (A | B | C) isn't the same type as (A | (B | C)).

Further the anon-enum ((A | C) | (B | C)) shouldn't be equal to (A | B | C). Because they contain different information A or B or left C or right C for the first enum and A or B or C for the second.

In the case of ((A | C) | (B | C)) I don't know what A and B are, they could be anon-enum's containing C but the code should still be clear as to what information is being encoded.

let anon_enum : ((A | C) | (B | C)) = c; // Error ambiguous assignment
let anon_enum : ((A | C) | (B | C)) = c as (A | C); // Ok assigning to left C
let anon_enum : ((A | C) | (B | C)) = c as (B | C); // Ok assigning to right C

I guess assignment would need to navigate the anon-enum like a tree, and pick the type in the highest level, so long as it is unambiguous.

let anon_enum : ((A | C) | (B | C)) = c; // Error ambiguous assignment
let anon_enum : ((A | C) | (B | C)) = a; // Ok assigned to first a
let anon_enum : ((A | C) | (B | C)) = b; // Ok assigned to first b

Now its possible A and B are the same type, but if the asignment is defined in terms of A than it should resolve to A and if the asignment is defined in terms of B it should be asigned to B (see or_0u8())

That said I still think it is useful to be able to convert between enums with more information to enums with less information.

let anon_enum : ((A | C) | (B | C)) = a; // Ok
let anon_enum2 : (A | B | C) = anon_enum; // Ok
let anon_enum3 : ((A | C) | (B | C)) = anon_enum2; // Error ambiguous assignment where would C go

Tom-Phinney · July 24, 2020, 4:02pm

My bad. I wrote that late last night my time. I was thinking of nested macros from more than one source, but wrote "nested generics". My first concern is that if one macro or generic specifies errors ( A | C ) and another specifies ( D | C | B ), that the unification of the two be independent of the order or repetition of the constituent error types.

Your proposal seems to require that each author of an "anonymous error enum" be aware of the specific element order, and any subnesting, of each other such enum with which it might be unified. To my mind such required awareness contradicts the notion of "anonymous".

Jon-Davis · July 24, 2020, 4:19pm

I see what you mean, I take "anonymous" to mean "unnamed". The anonymous enum doesn't have a type name and the variants do not have names. But the internal structure is still known. I bring up tuples because I see them as "anonymous structs", their type doesn't have a name, and their fields don't have a name, but their internal structure is still known to the user.

So if a macro were to generate tuples (A, C) or (D, C, B), since the ordering matters in the case of "anonymous structs" then I think it's fair if the ordering were to matter in the case of anonymous enums if ( A | C ) or ( D | C | B ) was generated.

robinm · July 24, 2020, 4:53pm

If I understand correctly, this make this pattern hard to write:

type AB = (A | B); // "named" anonymous union
enum BC { B, C }   // same idea

// Somewhere else in the code
fn foo() -> Result<(), AB>;
fn bar() -> Result<(), BC>;

fn baz() -> ??? {
    foo()?;
    bar()?;
    Ok(())
}

The author of the function baz() would look at the documentation of foo() and bar(), and would assume that the returned type of baz() should be Result<(), (AB | BC)>. Obviously it doesn't work, since the use of operator ? would lead to ambiguities. Indeed, the result type of baz() should be Result<(), (A | B | C)>. This requires, when writing the baz() function, to read not only the documentation of foo() and bar(), but also the documentation of AB and BC to remove those abstractions.

Initially, I though it was a bad idea to support this pattern, but the more I think of it, the more I don't think it's an issue. The whole goal of having anonymous struct is to be able to declare foo() and bar() without packing the arguments (neither in type AB = (A | B), nor in enum ErrorType { A, B }).

fn foo() -> Result<(), (A | B)>;
fn bar() -> Result<(), (B | C)>;

And then it become trivial to find the appropriate return type for baz() -> Result<(A | B | C)>`.

Within the context of generics, it shouldn't be a blocking problem either, since it's always possible to explicitly create a new anonymous type using GAT:

fn baz<AB, BC>(
    foo: impl Fn() -> Result<(), AB>,
    bar: impl Fn() -> Result<(), BC>
) -> Result<(), CommonTypes<AB, BC>::type {
    /* ... */
}

In C++ it would be relatively easy to create a variadic template type CommonTypes that would be able to return the equivalent of (A | B | C). I assume it will be the same in Rust when GAT will be implemented.

However:

even if the named anonymous pattern is not really something that we would like to support
even if CommonTypes can be created using GAT (and it's an issue by itself to depend on GAT)

I still don't see where you would want to be able to do the distinction between (A | B) and (B | A), nor where would (A | A) be useful. Having language support for anonymous unordered set of types seems more useful than anonymous ordered tuple of types, especially in the context of returning errors.

Jon-Davis · July 24, 2020, 5:25pm

I don't think this would be ambiguous. foo and bar don't return type B, foo returns type AB and bar returns type BC. if foo were to return b I would expect it to be the first variant of baz and if bar were to return b than it would be the second variant of baz.

I don't think this requires the writer to know the implementation if they simply want to return (AB | BC) however if they wanted to 'unbox' the type they could write (A | B | BC). I don't know if you should be able to unbox a regular "named variant" enum though.

In the context of returning errors I would agree with you. My argument for having a distinction between (A | B) and (B | A) is because it is consistent with a strongly typed language. The tuple (A, B) is not the same as (B, A) even though they contain the same information. However if you are given an anon-enum like (A | (B | C)) you should be able to convert it to an enum (A | B | C) even if they aren't the same type.

CAD97 · July 24, 2020, 5:31pm

Basically, enum(A|B|C) should be a different type from enum(B|A|C) and from enum(A|enum(B|C)), but there should be simple ways to convert nested anonymous products into a canonical flat anonymous product (and without duplicates?), and to shuffle the order of the types around if necessary.

If the compiler imposes an arbitrary total order on types and always lays out anonymous products with tags in that order, converting between nominally different orders can even be a no-op. (Flattening and deduplicating would still require actual byte shuffling, though.)

Jon-Davis · July 24, 2020, 5:40pm

Yes exactly what I had in mind

mcy · July 24, 2020, 5:43pm

One thing I want to bring up with this discussion is that I feel we should be very, very careful to avoid accidentally designing std::variant, which is quite possibly the most insane API in all of C++17, and excellent prior art on how not to do this.

template <typename Variants...> class variant; is the tagged union of the types Variants..., plus a bizarre "poison" state that you can brick the type into by throwing an exception during move construction.

If we have std::variant<int, float> v, we can write std::get_if<int>(v) to get a pointer to the first variant, or nullptr if not present (which we can happily copy or move out of). std::variant<int, int> can be constructed out of an int or a float, and will automagically pick the correct variant. Finally, we can use std::visit as a bizarre form of match:

std::visit([](auto&& x) {
  using T = std::decay_t<decltype(x)>;
  if constexpr (std::is_same_v<T, int>) {
    int x = x; // ...
  } else if constexpr (std::is_same_v<T, float>) {
    float x = x; // ...
  }
}, v);

std::visit - cppreference.com provides some other methods for using std::visit.

This all seems par-for-the-course ugly C++ until you realize that std::variant<int, int> is a valid type, which is a completely reasonable result of instantiating std::variant<T, U>. This results in:

breaking std::visit, which can no longer tell the variants apart, making it useless for generic code.
You must use std::get_if<0> and std::get_if<1>.

What's the upshot? Either we wind up with a totally nuts API like C++, or we're forced to pick one of the following:

Index-only access of variants, like we do with anonymous tuple fields (I'll point out that C++ std::tuple has the same insane API for std::get).
Type-only access, which implies that either (T | U) must be a concrete type wherever it appears, or carries an implied bound of where T != U. God help you if you mix lifetimes into this.

I strongly believe that the former is the only way forward; the latter has a lot of caveats that make it not feel like it's a proper part of the Rust language (where all of our other constructs tend to fit together fairly naturally).

Also, regarding trait impls, we would need to define an "enum safe trait", which is probably "object safe trait" except only associated types and constants, as well as static methods, are still disallowed. We also need to explicitly say that Self -> Self typed functions project through the variant (do we want something like Self -> Box<Self> to project through...?).

(Also, separately, should we also define "tuple safe traits"? This seems like it would be solved by variadics...)

Flattening would still require actual byte shuffling, though.

We could just make the discriminants be TypeId::of::<T>() . (Which wouldn't actually work but needessly-64-bit discriminants seems hilarious to me.)

CAD97 · July 24, 2020, 6:53pm

No, you'd still need to change it.

Consider enum(A|enum(B|C)) -> enum(A|B|C). To start, you have a tag of id_of::<A>() or id_of::<enum(B|C)>(), and a payload of type A or enum(B|C). The target is a tag of id_of::<A>(), id_of::<B>(), or id_of::<C>(), and a payload of type A, B, or C. Because of the restriction that enum(A|enum(B|C)) needs to literally contain an enum(B|C) (in that variant) so you can take a reference to it, there is absolutely no way to make these two representations be the same.

Additionally, we've said we want enum(A|A) to be able to tell the difference between the two variants, so the discriminant can't actually be just a type id (even if you ignore the fact there are still potential type id collisions).

Yes, you're correct that it breaks C++ std::variant/std::visit. But it needn't break Rust's language-level support for structural product types.

"Just" say that type-dispatched matching over for<T> enum(T|u32) is resolved pre-monomorphization (which is impossible for C++ templates IIRC) so that the first variant of the product type takes the T branch, and the second variant takes the u32 branch. No type equality check required.

It would be theoretically possible for the common case in Rust to use type-dispatched matching (based on the written type at the point where the match is written, not the actual type) and to have index-dispatched matching for the resolved case where there are two of the same type that need to be differentiated.

In fact, the latter can be written in terms of a (hygenic) former:

fn visit2<T, U, F, R>(variant: enum(T|U), f: F) -> R
where
    F: FnOnce(T) -> R,
    F: FnOnce(U) -> R,
{
    match variant {
        t: T => f(t),
        u: U => f(u),
    }
}

(Oh whoops can't use that because you can't produce a closure that can work on either type, so take two closures instead (that unfortunately can't consume the same state anymore, so this needs to be a language level feature))

fn visit2<T, U, F, R>(
    variant: enum(T|U),
    f_t: impl FnOnce(T) -> R,
    f_u: impl FnOnce(U) -> R,
) -> R {
    match variant {
        t: T => f_t(t),
        u: U => f_u(u),
    }
}

A proper hygenic nominally type dispatched match can still be a index dispatched match under the covers for Rust. (IIUC, this actually isn't the case for C++, as (outer) templates are resolved first, and at that point it's impossible to tell T and U apart if they both resolve to the same type.)

Jon-Davis · July 24, 2020, 8:01pm

I think this is a great idea. This could result in code that looks like this:

fn readNum(file_name : &str) -> Result<i32, (io::Error | ParseIntError)>;

// And then use the match syntax 
match (readNum("num.text")){
   Ok(success) => (),
   Err(io: io::Error) => (),
   Err(parse: ParseIntError) => (),
}

// which would just be syntax sugar for something like this
match (readNum("num.text")){
   Ok(success) => (),
   Err(ref.0(io)) => (),
   Err(ref.1(parse)) => (),
}

The readability of the first one is definitely better than the second. You would have to import the types but I don't see that as a huge issue.

mcy · July 25, 2020, 1:12am

To be clear, that was meant as a joke, for the exact reasons you pointed out.

Probably not in current C++, but if constexpr combined with SFINAE probably gives you a mechanism for detecting if a type is a template parameter or a concrete type (let's not go there ._.).

But, yes, a hygienic ascription gets us most of the way there. Unfortunately, you still have the std::visit problem that you're completely confused by std::variant<int, int>, and the following code is liable to break:

type A = i32;
type B = u32;
let x: (A | B) = ...;
match x {
  x: A => ...,
  y: B => ...,
}

if A and B ever became the same type (though this is already a breaking change so it's probably ok...?).

Like, type-based matching is great and hits 99% of use-cases, and through the power of not having an insane generic programming paradigm we can even make the generic case Actually Work. Maybe the right answer is secretly "let's not make (T | T) usable without contorting yourself, because it should be named enum anyway", which I think I can get behind.

Except, I did just think of this sadness:

fn foo<T, U>() -> (T | U) { ... }
match foo::<i32, i32>() {
  // um... now what?
}

So I guess "just give up on making (i32 | i32)" usable isn't a great option... =/ Maybe we should just give up and have both like you suggest.

There's also this sadness yet, which I don't have a good answer for:

let x: (i32 | i32) = 42;

C++'s answer is std::variant<int, int> v(std::in_place_index<0>, 42). I think we're going to have to do something functionally equivalent: coersions (personally against coersions in this case but shrug) when there's an obvious right answer, indices otherwise.

Hehe, this reminds me of this alternative to the silly [](auto x) + if constexpr trick:

template<typename... Functors>
struct Merge : public Functors... {
  using Functors::operator()...;
};
template <typename... Functors>
Merge(Functors...) -> Functors<Functors...>;

auto f = Merge([](int x) { ... }, [](float x) { ... });

steven099 · July 25, 2020, 2:20am

Perhaps I'm mistaken, but it's looking like a lot of the headaches are coming from making the anonymous enum positional rather than normalizing it. If the position of a variant in the enum has semantic meaning, at that point it seems to me like it would be better to define an explicit enum so that you can document that meaning. The motivation of anonymous enums seems to be more about being able to have a type which can represent the union of multiple existing types without needing to add redundant labels for each representable type.

In terms of allowing anonymous enums to implement traits, perhaps the solution is to allow traits to be implemented for "atomic" anonymous enum types, i.e.

impl<T: Trait> Trait for (T|)

Then (T|U) would support the intersection of traits implemented for (T|) and for (U|). This might require that enum atoms aren't initially considered fundamental, so that traits have the opportunity to add blanket impls before downstream crates can start creating these impls for their own types (unless specialization is stabilized). Would this solve the issues with having traits on anonymous enums implied by their variants?

Edit: The answer is probably "no" as written. It starts to seem like you either need variadic generics to get this to work, or a very clever compiler. If full normalization is done (flattening, deduplicating, and order independence) then you need some way to destructure down to a root case. Potentially if you did an impl for (T|U) and had logic for handling either T or U, the compiler could use that logic to break down any enum, but again, that would require a very clever compiler.

Atoms might also solve the type inference issues. As in

let x = if cond { (5u8|) } else { (7u32|) };

Essentially, by having the arms be already anonymous enum typed, the compiler knows not to try to unify the arms, but instead make the if expression evaluate to the union of the arm types.

Bikeshed on anonymous enum patterns: (x |: u8)

CAD97 · July 25, 2020, 2:34am

There's a completely different set of headaches in "normalizing" the product types. The key one is

let x: (T | u8);
match x {
    _: T => println!("T"),
    _: u8 => println!("u8"),
}

which in the case of T=u8 has the u8 case or the T case magically jumping to the other match arm.

If you think every match is just dispatching the same operation based on what type it is, then maybe you can rationalize this as "just" specialization. But, it's also specialization, with all of that baggage:

fn specialize<T>(t: T) {
    let t: (T | u8 | u16 | u32) = t;
    match t {
        t: u8 => specialize_u8(t),
        t: u16 => specialize_u16(t),
        t: u32 => specialize_u32(t),
        t: T => specialize_default(t),
    }
}

mcy:

But, yes, a hygienic ascription gets us most of the way there. Unfortunately, you still have the std::visit problem that you're completely confused by std::variant<int, int> , and the following code is liable to break:
type A = i32;
type B = u32;
let x: (A | B) = ...;
match x {
  x: A => ...,
  y: B => ...,
}
if A and B ever became the same type (though this is already a breaking change so it's probably ok...?).

This could be additionally guarded against by the "unreachable match arms" lint to point it out and say "hey, you probably didn't mean this". The lint would probably have a specialized version for two different type aliases that resolve to the same type.

steven099 · July 25, 2020, 2:39am

This seems fine to me . I can see how it might be a consideration whether this is desirable, but it seems to follow from the normal behaviour of matches, just with type parameters in the mix.

Based on my previous comment, this could potentially be written as:

fn specialize<T>(t: T) {
    match (t|) {
        (t |: u8)  => specialize_u8(t),
        (t |: u16) => specialize_u16(t),
        (t |: u32) => specialize_u32(t),
        (t |: T)   => specialize_default(t),
    }
}

Edit: I'm basically caught in a loop of agreeing and disagreeing with myself. I'm sort of thinking "don't allow (T|T) to be instantiated directly or as part of a function signature, and observe hygiene inside generic functions" might be the "right" answer. Workaround at the call site using a newtype struct if you have to.

Jon-Davis · July 25, 2020, 3:34am

Summary

This thread is fairly long, and every post is an essay unto itself so I wanted to make a summary of where my current ideas are with the feedback incorporated from the above discussion edit: and some below.

Order Dependence

Anonymous Enums would be strongly typed. I.E (A | B) != (B | A). My reasoning is it is consistent with the rust type system and with anonymous structs (tuples). And is required for the indexed matching scheme.

Matching

The matching would be index based, consistent with tuples, the example I have been giving looks like this

let anon_enum3 : (A | B | C) = ...;
match anon_enum3 {
    ref.0(a) => (),
    ref.1(b) => (),
    ref.2(c) => (), 
}

The keyword used could be something else like Self.0(a), enum.0(a) or even 0(a), I have been using ref arbitrarily.

Coercion

When possible you should be able to coerce one anonymous enum to another, so long as no assignment would be ambiguous.

let anon_enum : (A | (B | C)) = a;        //Ok
let anon_enum2 : (A | B | C) = anon_enum; //Ok

// these are still different types
anon_enum == anon_enum2 // Error can't compare type (A | (B | C)) with (A | B | C)

With this the user may specify if they wish to flatten an anonymous enum or change the order around. With compiler optimizations, their internal representation might be the same so this could be a no-op in some cases.

Assignment

Assignments must be unambiguous as to what variant is being assigned. Only assignments where the rhs of the assignment is a first-level type variant are allowed.

// Example 1: Simple Case
let _: (A | A) = a; // Error ambiguous assignment
// Example 3: Order of assignments
let _: (A | (A | B)) = a;            // Ok assigned to first a
let _: (A | (A | B)) = a as (A | B); // Ok assigned to second a

Generics

Generic implementations will be consistent, and determined using the local generic types and not the concrete type.

fn or_u8<T>(t : T) -> (T | u8) {
   if random(){
     t // Regardless of what T is, this will always be the first variant
   } else {
     0u8 // Regardless of what T is, this will always be the second variant
   }
}

let _ : (u8 | u8) = 3u8; // Error ambiguous assignment
let anon_enum : (u8 | u8) = or_u8(5u8); // Ok

// This will always assert(true) because the assignment is unambiguous
match anon_enum {
   ref.0(n) => assert!(n == 5u8),
   ref.1(n) => assert!(n == 0u8),
}

Type Matching

Type matching would be syntactic sugar of indexed matching and would allow you to match on the locally defined types.

let anon_enum : (T | u8) = ...;
// Example of type matching
match anon_enum {
   a  T => ...;
   b: u8 => ...;
}
// This is the equivalent of 
match anon_enum {
   ref.0(a) => ...,
   ref.1(b) => ...,
}

Even if T was u8 the match would be predictable based on whether anon_enum was set through T or u8

I imagine this would be the common case, how matching on anonymous enums is most often done. However type matching wouldn't work in all cases, such as (u8 | u8), which is why indexed matching would be the underlying implementation.

Traits

One of the primary use cases for Anonymous Enums would be for error handling, so ideally we would like anonymous enums to be able inherit traits from their variants conveniently. It has been stated before that it may be unsafe and unsound to simply inherit all shared traits. An analogue of "object safe traits" could be used for automatically implementing "product safe traits".

CAD97 · July 25, 2020, 4:08am

The two adjustments I would make:

Only allow assignment if the rhs of the assignment is a first-level type variant. e.g. _: (A | (B|C)) = a and _: (A | (B|C)) = b as (B|C) is ok, but _: (A | (B|C)) = b is not. This removes the "conflict at the same level" and "lowest level" rules in favor of a much simpler rule of "the unambiguous first-level variant".
Mention the possibility of an analogue of "object safe traits" for structural product types. While explicit hints are probably fine (and potentially even desirable) initially, automatically implementing the "product safe traits" would likely be a fairly simple extension for a huge ease-of-use benefit. (I think the current definition of object-safe traits should be correct for product-safe traits? We'd like to expand what is object-safe in the future, so making product-safe a separate thing would unfortunately be required, as it's more restrictive.)

And as a side note: I'm fairly certain we could potentially make the indexed matching syntax just 0(a) as a direct analogue of tuple member accessing, a.0. I fail to see any syntactical issues, as numeric literals are never callable.

steven099 · July 25, 2020, 4:17am

I was about to say, could object safety be leveraged here? Basically, methods that are callable on an object type that can be constructed from all variants are callable on the enum. If the trait is implemented for its own object type, then it is implemented for the enum type.

Edit: There's a discrepancy around sizedness with object safety. Enum types are presumably sized, but methods with a where Self: Sized bound are always considered object safe even if they break other object safety rules, since they aren't callable on object types anyways. Enum safety can't have this particular rule. Traits which currently inherit from Sized specifically to exclude object types would now have their safety rationale broken.

robinm · July 25, 2020, 7:12am

I still fail to see where (A | A), and having (A | B) different from (B | A) would be useful. Could you please provide concrete examples. And could you please explain why regular enum wouldn't be a good fit for those use-cases. I feel that we are trying to solve a non existing problem.

luben · July 25, 2020, 1:38pm

I am a little bit confused with the later comments when they refer to anonymous enum types as "structural product types". In my understanding, product types are the structs and tuples: the number of the values that can inhabit them is the product of the number of values that can inhabit each ot their members, e.g. (bool, u8) can be inhabited by 2*256=512 distinct values. Similarly, enums, unions, and their anonymous variants that we discuss here are sum types for the same reason, e.g, (bool | u8) could be inhabited by 2 + 256 = 258 distinct values.

I understand that the proposal is more involved but I think the above reasoning and common terminology should still hold. Am I missing something?

Topic		Replies	Views
Pre-RFC: Anonymous variant types language design	93	6423	March 25, 2019
pre-RFC: anonymous enums language design	13	5614	March 25, 2019
Pre-RFC: sum-enums language design	86	8208	April 16, 2019
Concept RFC: Tuple Enums	32	3475	November 12, 2020
Pre-RFC: Using existing structs and tuple-structs as enum variants language design	40	5114	March 25, 2019