Ideas around anonymous enum types

I think it might benefit the conversation if a list of known issues such as the above (&'a T | &'b T) could be compiled into a single post. I would like to say that I am not against anonymous unions on a personal level, it's just to my eyes this broad concept has seemed to have stalled due to a large of number issues that when looking deeper emerge. Personally, I would like to see some "anonymous enum/union/sum type" in rust in the next few years, and not in limbo for a decade. With a solid list we could

  1. Create a comprehensive solution and make progress on the anonymous unions concept.

or, if that fails

  1. Use it as reference to why a different approach is required.

Unfortunately I don't have the technical know-how, on how a lot of this works, and would probably misrepresent the issues.


I think both the union and sum type approach could handle this use case with pattern matching, both of them would result in a type match like:

let result : Result<(A | B), (X | Y)> = ...;
match result {
   Ok(a : A) => ...,
   Ok(b : B) => ...,
// Ok(both) => ...,
   Err(x : X) => ...,
   Err(y : Y) => ...,
// Err(both) => ...,
}

Additionally we would want to keep the Err variants together if we simply wanted to log the error.

match result {
   Ok(a : A) => ...,
   Ok(b : B) => ...,
   Err(dont_care_which) => log!(dont_care_which),
}

I think keeping enums separate is ideal for both anon unions and anon sum types.

2 Likes

What's about matching on Result<(T|Option<U>),E>?

Shouldn't this be expected to works?

match result {
    Ok(t: T) => …,
    Ok(Some(u)) => …, // u as type U
    Ok(None) => …,
    Err(e) => …, // e as type E
}

However, I'm not sure what should be done here:

let x = (Option<A> | Option<B>);
match x {
    None => …, // Should this line compile?
    other => …, // What is the type of other? `(Some(A) | Some(B))`?
}
2 Likes

my guess is like robinm's but I think we would probably need to be specific in both branches of the anon-enum to signal we are switching to Type matching from enum matching.

match result {
    Ok(t: T) => …,
    Ok(Some(u): Option<U>) => …, 
    Ok(None: Option<U>) => …,
    Err(e) => …, 
}

Hmm I think you would need to write it like this?

match x {
    Some(a): Option<A> => ...,
    Some(b): Option<B> => ...,
    None: Option<A> | None: Option<B> => ...,
}

Instead of None you could also use the catch all _ =>


1 Like

Or even

match x {
    Some(a: A) => ...,
    Some(b: B) => ...,
    _ => ....,
}

Also the pattern None: Option<A> | None: Option<B> is invalid: None have two types and has no preferred one.

This collaborates a lot with type ascriptions.

1 Like

I'm a little confused, why is it invalid? The | operator there is the multiple match pattern operator, such as in

match x {
   0 | 1 => ...,
   2 => ...,
   _ => ...,
}

I would imagine multiple match patterns would be valid in type matching as they are valid in current matching.

I wanted specifically to match on None. Maybe this could work:

match x {
    None : (Option<A> | Option<B>) => …,
    _ => …,
}

Anyway I don't think it's that important, it should just be clarified, whatever form is prefered.

1 Like

Next hard case is matching on Option<T>|Option<T> (does it coerce into Option?)

Oh, I'm sorry for that, I forgot that None is still type. And there no conditional binding of data, which is illegal (Ok(v) | Err(e) is incorrect pattern because we don't know which binding, v or e is initialized), I got confused too.

Anyway it's too confusing pattern. Same variant from two types. Also there was a proposal to make enum members true types.

Putting my own interpretation for these different examples:

match it {
    Ok(t: T) => (),
    Ok(Some(u: U)) => (),
    Ok(None::<U>) => (),
    Err(e) => (),
}

other would still be (Option<A> | Option<B>). I think None on its own should not be allowed, as it's ambiguous whether it means None::<A>, None::<B>, or both. I'd probably write the given match as

match it {
    Some(a: A) => (),
    Some(b: B) => (),
    None::<A> | None::<B> => (),
}

As there are two variants with the same type, you'd have to use the index-based destructuring.

match it {
    0(Some(t)) => (),
    0(None) => (),
    1(Some(t)) => (),
    1(None) => (),
}

If you wanted to be really pretentious, the anonymous sum type could be written (A + B). I think it's exactly the confusion between sum and union types that I've always preferred writing it as enum(A | B) when talking about the idea divorced from a specific syntax proposal.

2 Likes

Questions Regarding sum types

I have a few questions for those more knowledgeable than me regarding sum types.

Question 1

Assuming the following is a valid type match

let st : (A | B) = ...
match st {
   a: A => ...,
   b: B => ...
}

And Assuming coercion into more normalized forms is permitted

let a : (A | (A | B)) = ...;
let b : (A | B) = a; // Valid

Could a type match be a coercion site into a normalized sum type described by it's branch typing

let st : (A | (A | B)) = ...;
match st { // coerced into A | B here
   a: A => ...,
   b: B => ...,
}

Question 2

Assume we have a sum types with a type aliases

type AB = (A | B)
type BC = (B | C)

I presume this would work

let st : (AB | BC) = ...
match st {
   ab: AB => ...,
   bc: BC => ...,
}

would coercing from type aliases into a normalized sum type work

let st : (AB | BC) = ...;
let st : (A | B | C) = st; // coerce into normalized form unpacking aliasing

Question 3

If both question 1 and 2, would this work

let st : (AB | BC) = ...;
match st { // coerce into A | B | C
    a : A => ...,
    b : B => ...,
    c : C => ...,
}

I know you wouldn't be able to mix and match these two as

match abbc {
   a: A => ...,  
   bc: BC => ..., 
}

would be the same coercion as:

(A | (B | C)) = ((A | B) | (B | C))

which is invalid as there is no top level B on the lhs.

I would say no. Reasoning: you want a: A to only take the first arm, so if the second arm is ab: (A|B) it is the second variant. With proper generalized type ascription you could do match st: (A|B) {, or with just as you could do match st as (A|B) { if you want that behavior.

In more language reference terms: the input expression of a match expression is not a coercion point for anonymous sum types.

Yes; Rust's type aliases are completely transparent and the same as just writing their RHS.

Precondition is false. Postcondition is also false for the same as in question one; type (ascription) based matching is just sugar for index-based matching the root variant with said type.


So that said, the obvious next question: is the following allowed?

match it as enum(enum(A | B) | enum(C | D)) {
    a: A => (),
    b: B => (),
    c: C => (),
    d: D => (),
}

I think the only consistent answer is no.

Perhaps, in the vein of match ergonomics default binding modes, we can introduce more permissive but also more complicated rules in the future, but I think starting with the simplest possible rule initially (where type based sum destructuring is just sugar for index based sum destructuring) is the best path forward.

1 Like

I fully agree with this, and this could be added later. I imagined the following:

Edit 2 Disregard this post actually, This would cause an ownership issue as matching it would require a move and coercing a &(A | (B | C)) into a &(A | B | C) is not valid for sum types.

let st : (A | (A | B)) = ...;
match st { // coerced into A | B here
   a: A => ...,
   b: B => ...,
}

would desugar into

let st : (A | (A | B)) = ...;
let _ : (A | B) = st; // based on match arms
match _ {
   0(a) => ...;
   1(b) => ...;
}

and this

would desugar into

let _ (A | B | C | D) = enum(enum(A | B) | enum(C | D));
match _ {
    0(a) => (),
    1(b) => (),
    2(c) => (),
    3(d) => (),
}

Implementation wise, type matching may be easier to implement if coercion was allowed in the match expression, not that ordering is particularly complicated problem to solve though.

Edit We would have to implement a coercion site both in a match statement and in nested patterns so I take the "easier" back. Branch level coercion should be avoid too.

let st : (B | C | A) =...;
// Type matching  | no coercion    | coercion
                  |                | let _ : (A | B | C) = st; //no-op coercion
match st {        | match st {     | match _ {
   a: A => ...,   |   3(a) => ..., |   0(a) => ...,
   b: B => ...,   |   1(b) => ..., |   1(b) => ...,
   c: C => ...,   |   2(c) => ..., |   2(c) => ...,
}                 | }              | }

From the point of view of the "type level sets" proposal my answer is yes to all 3 questions. The proposal allows to match on sum types, so the syntax in the second question is allowed together with explicit match on A | B and B | A. I think Rust should always normalize sum types, so (A | (A | B)), (AB | BC), (B | A), etc. will be essentially just aliases for (A | B).

Note that with that proposal you will have to be explicit during value creation:

let a: (A | B) = ...;
let b: (A | B | C) = a; //error
let b: (A | B | C) = become a; // ok

One not so obvious property is that match will be able to change tag value under the hood:

// let's say A, B and C have tags 0, 1, and 2 respectively
type ABC = (A | B | C);
// B has tag 0, and C has tag 1
type BC = (B | C);
let t: ABC = ..;
match t {
    v: A => { .. }
    v: BC => { .. } // Rust will re-map tags here
}

To make such re-mappings less frequent, I've proposed to use tags based on TypeId.

I think such types in explicit form should be forbidden (i.e. you will not be able to write (&'a u32 | &'b u32 | u64)) and in generic contexts type parameter would have to be bound by lifetime which lives no longer than any other lifetime present in this sum type. For example, bound could look like that: (T + 'a, &'b u8, &'c u32) where 'b: 'a, 'c: 'a. Yes, it's a serious limitation, but I think it will not be a big problem in practice.

Alternatively we would need a way to specify a bound which will guarantee that T will not be a sum type containing &u8 or &u32.

Personally, I feel that the last few posts in this thread indicate that this proposed feature is not sufficiently user-friendly to warrant adoption. :-1:

1 Like

Could you please be more specific?

Do you dislike the potential coercion inside match? It's just an implementation detail, which will "just work" and I believe should be quite intuitive.

Is it about interaction with lifetimes in generic contexts? Even without such feature, it's usually quite advanced topic. Yes, with it you would have to satisfy some peculiar constraints, but nothing mind-blowing in my opinion. And I think most users will rarely encounter such edge cases.

Please, I know I messed up the terminology earlier, but do try to get it straight. The type level set you're talking about is a union type, not a sum type.

I do apologise for any confusion but it's an unfortunate reality of two meaningfully distinct but very similar features being proposed and being pitted against each other.

It's at this point I have to ask readers to try to separate @newpavlov's union type from the main proposal here, which is anonymous enums, which is a sum type. I personally believe that a union type (beyond making dyn Any better to work with) does not carry its weight for Rust, but an anonymous enum could for the same reason we have anonymous structs (tuples).

But to be frank: I think the potential confusion of people using a feature expecting one and getting the other (between sum types and union types) makes the addition of either (or both!) require a higher standard that I don't think is possible to meet.

And please, a union type is not an anonymous enum. An enum is a sum type, not a union type, though a sum type is a "tagged" union type.

Especially if it's based on TypeId, @newpavlov, what you're describing is "enum impl Any", not an anonymous enum.


A quick bit of type theory to explain the Algebraic Data Type terminology:

A "type" is a set of possible values that an instance of that type could be. For example, the type u8 is the set of potential values {0, 1, 2, ..., 255}.

A "product type" is what you get by taking the product of two sets. If you multiply u8 by u8, you get the tuple type (u8, u8), which is the set of values {(0,0), (0,1), ... (1,0), (1,1), ..., ..., (255,255)}. Note that if there are N potential values in the first set and M in the second set, the number of values in the resulting set is the product, N×M.

Sum types and union types break the perfect mathematical analogy and take a little bit of adapting to work.

A "sum type" is what you get by taking the "sum" of two sets to get a set of size N+M. Depending on who you ask, you can't actually "add" sets together, because an item can't be in a set twice. Some will tell you that you can add two sets, but only if they're nonoverlapping. To get two nonoverlapping sets, we tag each value from each set with which set it came from, and then take the union of the sets. So if we take the sum of u8 and u8, we get {Left 0, Left 1, ..., Left 255, Right 0, Right 1, ..., Right 255}.

A "union type" is what you get by taking the regular union of two sets. If you union i8 with u8, you get the potential set of values {-128, -127, ..., 255}. Of course, it's not as clean as that, as 0i8 is actually a different value from 0u8, so we find the unioned type set is actually {-128i8, -127i8, ..., 127i8, 0u8, 1u8, ..., 255u8}. In the case where two values of the same type are present in the set of possible values, they are collapsed to the one set member per normal set behavior. IOW, the union of u8 and u8 is just u8 again.

5 Likes

To explicitly reiterate: I don't think we can have anonymous sum or union types, because it'll never be properly clear which one the feature is supposed to be.

enum currently is closest to an algebraic sum type, because each variant is a tag around some existing type(s).

But a sum type is just a union type where you tag the input types, so an enum is "just" a tagged algebraic union type (which, again, is exactly a sum type).

But then there are the unaccepted (I cannot stress enough that doing language design off of unaccepted extensions to the language is a bad idea) extension proposals to make enum variants proper types, and even to allow using existing types directly as enum variants, which brings them to being a proper union type (with convenient sugar for making them act like sum types).

So I guess the point of this post is to say that trying to design an anonymous enum is a fruitless task until after enum variants as types is decided one way or the other.

3 Likes

Yes, I agree, I've messed-up the terminology (the distinction was mentioned in the previous discussions as well, but I forgot about it), all my posts are about "union types". Unfortunately today union has a different meaning in Rust, the one inherited from C, so it complicates situation even further.

Since use-cases of "union types" and "anonymous enums" overlap too much, I highly doubt we will see both features implemented. So it's inevitable that they get discussed simultaneously as competing proposals.

IIRC RFC 2593 does not allow to use existing types directly as enum variants, only to use enum variants as types, so it does not quite change enum to "a proper union type". Also this RFC is really popular and quite non-controversial, so I believe probability of its acceptance is really high.

I don't have a reference for it, but I know I've seen people discuss the extension to "enum variants are types" to "types as enum variants" on this forum at least.

And even just "enum variants are types" takes enum from the current "mostly a sum type" to "mostly a union type", as it would be a union type of its variant types (Result would roughly be the union type type Result<T, E> = ( Result::Ok(T) | Result::Err(E) )).