Pre-RFC: sum-enums

newpavlov · November 7, 2018, 5:07pm

UPD: See the next version of this proposal: Pre-RFC: type-level sets

This proposal is a draft of alternative to RFC 2587. This proposal is less flashed out compared to the RFC 2587 and provides a general direction, which I believe is better to pursue.

About terminology

The proposed construct will be called sum-enum to distinguish it from the existing enums. Some prefer to call such construct union type, but unfortunately Rust already has a different kind of unions. Just consider it a temporary name. (better naming proposals are welcomed of course)

Summary

Introduce a new syntax enum(A, B, C, ..) and enum Foo(A, B, C, ..)
Utilize type ascription in patterns to perform matching on sum-enums

Motivation

There is two main use-cases for this proposal:

Removing a need for short-lived “throw-away” enums:

 // enum(..) implements `From` for each "summed" type,
 // so `?` works as expected
 fn foo(val: Weak<str>) -> Result<i64, enum(NoneError, ParseIntError)> {
     let strref = Weak::upgrade(val)? 
     let num = i64::from_str_radix(strref, 10_u32)?
     Ok(num)
 }
 
 match foo(val) {
     Ok(n) => { .. },
     // equivalent to `Err(NoneError: NoneError)` and `Err(_: NoneError)`
     Err(NoneError) => { .. },
     // note that for now you can't write `Err(err): Err(ParseIntError)`
     Err(err: ParseIntError) => { .. }
 }

Allow to return several types using impl Trait by automatically creating a sum-enum:

// output of this function is a sum over 2 anonymous `impl Trait` types,
// variants are automatically collected by compiler
fn foo<'a>(data: &'a [u32], f: bool) -> impl Iterator<Item=u32> + 'a {
    if f {
        // `into` is required for conversion into implicitly constructed
        // sum-enum
        data.iter().map(|x| 2*x).into()
    } else {
        data.iter().map(|x| x + 2).into()
    }
}

Explanation

Existing enums can be seen as a sum-type which implicitly creates wrapper type for each variant (it’s not quite how it works today of course), so we can generalize this behavior and introduce a tuple-like syntax based on existing enum keyword:

enum Foo(u32, u64)
// or alternatively
type Foo = enum(u32, u64);
type Bar<T> = enum(u32, T);

fn foo() -> enum(u32, u64) {
    if condition() {
        1u32.into()
    } else {
        1u64.into()
    }
}
fn foo<T>() -> enum(u32, T) { .. }

Considering sum type nature of sum-enums, they have the following properties:

enum(A, B) is equivalent to enum(B, A)
enum(enum(A, B), enum(C, D)) is equivalent to enum(A, B, C, D)
enum(A, B, A) is equivalent to enum(A, B)
enum(A) is equivalent to A
enum(A, B, !) is equivalent to enum(A, B)

The main way to create sum-enums will be to use Into trait implementation.

In cases when ordering of type arguments is required it can be done based on TypeId of each type.

Internally sum(u32, u64) can be represented as:

union SumUnion_u32_u64 {
    f1: u32,
    f2: u64,
}

struct SumEnum_u32_u64 {
    // tag is TypeId based, and can be u8/u16/... depending
    // on a number of variants
    tag: u8,
    union: SumUnion_u32_u64, 
}

Alternatively for tag we could use TypeId directly, it will simplify conversions between sum-enums, but will result in a 8-byte overhead, while usually having just 1 byte will be enough.

Matching on sum-unions will be the same as for usual enums:

match val: enum(u32, u64, ()) {
    v: u32 => { .. },
    _: u64 => { .. },
    // type ascription can be omitted
    () => { .. }
}

// see motivation example
match foo(val) {
    Ok(n) => { .. },
    Err(NoneError) => { .. },
    Err(err: ParseIntError) => { .. }
}

match val: enum(u32, u64, ()) {
    v: enum(u32, u64) => { .. },
    () => { .. },
}

match val: enum(u32, u64, ()) {
    v: u64 => { .. },
    // `v` will have type `enum(u32, u64, ())`
    v => { .. },
}

In the last example we don not convert type of v to enum(u32, ()) for several reasons:

Simplicity of implementation
Potentially incompatible memory layouts
Being coherent with the existing match behavior

Sum-enum will implement a minimal set of traits of included types. In other words trait is implemented for sum-enum only when all its variant types implement it.

Generic code

One of the issues often mentioned in discussions of sum types is problems with generic code, for example:

// what will happen if U == V?
fn foo<T, V>(val: enum(T, V)) {
    match val {
        v: U => { .. },
        v: V => { .. },
    }
}

Arguably considering monomorphization this code is very similar to this code:

// `get_id` is a static method of `MyTrait`
fn foo<U: MyTrait, V: MyTrait>(input: u32) {
    match input {
        n if U::get_id() == n => { .. },
        n if V::get_id() == n => { .. },
        _ => { .. },
    }
}

In other words we can “solve” this problem by (re)specifying that match arms are evaluated and executed in order, so if U and V (e.g. u32) have the same type foo will get monomorphized into the follwoing function:

fn foo2(val: u32) {
    match val {
        v: u32 => { .. },
        v: u32 => { .. },
    }
}

It’s obvious that only the first arm will be executed and the second one will be always ignores (and removed by optimizer). To prevent potential bugs compiler can issue unreachable_patterns warnings.

If one of generic types will be sum-enum, this case is handled by ability to include sum-enums into match arms (see third match example). If U = enum(A, B) and V = enum(B, C), then we’ll get the following code:

// enum(enum(A, B), enum(B, C)) == enum(A, B, C)
fn foo(val: enum(enum(A, B), enum(B, C))) {
    match val {
        v: enum(A, B) => { .. },
        v: enum(B, C) => { .. },
    }
}

It’s obvious that if variant has type B it will always go to the first arm, and the second arm will always get variant with type C. So code will work without any problems, and result will be predictable.

Possible quality-of-life features

In addition to the basic functionality we can introduce a trait which will allow as to generalize over sum-enums and to do various conversions.

trait SumEnum {
    /// return TypeId of the current variant
    fn get_type_id(&self) -> TypeId;
    /// get Iteratore which contains possible variants `TypeId`
    fn get_type_ids() -> impl Iterator<Item=TypeId>;
    /// create a new sum-enum from provided value if possible
    fn store<T: Sized>(val: T) -> Result<Self, Error>;
    /// convert to another sum-enum if possible
    fn convert_into<T: SumEnum>(self) -> Result<T, Error>;
    /// tries to extract variant with type T
    fn extract<T: Sized>(self) -> Result<T, Error>;
    fn can_store(tid: TypeId) -> bool {
        Self::get_type_ids().any(|&t| t == tid)
    }
}

Additionally we also could automatically implement TryInto/TryFrom traits for variant types. Ideally conversions should be able to check if conversions can be done at compile time, but unfortunately currently Rust does not provide tools for that, maybe in future with advancement of cons fns.

Sum-enum as a generalized enum

As was mentioned sum-enums can be viewed as a generalization of enum:

// this enum
enum A {
    Foo,
    Bar(u32),
    Baz{f: u64},
}
// can be (theoretically) desugared as
struct Foo;
struct Bar(u32);
struct Baz{f: u64};

enum A(Foo, Bar, Baz)

This could’ve automatically solved problem of joining nested enums (which currently is not always handled optimally), in other words Option<Option<u8>> would’ve been a sugar for enum(Some(Some(u8)), Some(None), None). Also matching would’ve been unified for usual and sum-enums.

Unfortunately this change is backwards incompatible (e.g. A::Bar currently has type Fn(u32) -> A), but nevertheless I think it’s an interesting idea to consider.

Unresolved questions

Naming: sum type, sum-union, type union, etc.
Keyword: enum(..) vs union(..) vs something else
Delimiter: enum(A, B), enum(A | B), enum(A + B)
Tag construction: auto-incremental approach, truncating TypeId and dealing with collisions
Infallible generic conversion of “subset” sum-enums to a wider sum-enum. (e.g. enum(u8, u16) to enum(u8, u16, u32))
Interaction with lifetimes. To start things off we could restrict sum-enum usage only with types which ascribe to 'static. (the same restriction as currently for TypeId)
impl Trait variations: some have proposed to use enum impl Trait, or using enum keyword for converting values to sum-enums
Details on how unification and handling of sum-enums should be done internally.

skysch · November 7, 2018, 8:47pm

newpavlov:

In other words we can “solve” this problem by (re)specifying that match arms are evaluated and executed in order, so if U and V (e.g. u32) have the same type foo will get monomorphized into the follwoing function:
fn foo2(val: u32) {
    match val {
        v: u32 => { .. },
        v: u32 => { .. },
    }
}
It’s obvious that only the first arm will be executed and the second one will be always ignores (and removed by optimizer). To prevent potential bugs compiler can issue unreachable_patterns warnings.

Does this work well in user code?

Lets say we have a generic function

fn make<T, S>(t: T, s: S) -> enum(T, S)

and a consumer:

fn consume<T, S>(input: enum(T, S)) {
	match input {
		v: T => { .. },
		v: S => { .. }, // Does the writer of this code know what order to put these lines?
	}
}

Then someone calls this code:

consume(make(0i32, 1i32)) // Does the writer of this code know they are getting the `T` path rather than the `S` path?

Who gets the unreachable code warning, and what should they do about it? (~~And what about the completely unused i32 value? Is that silently swallowed into the void? If so, that doesn't bode well for error handling...~~ EDIT: This it make's fault, so not a problem here.)

newpavlov · November 7, 2018, 9:01pm

Can you specify how make works in your case? To construct sum-enum you will have to choose either t, or s.

I don’t see much difference between matching generic sum-enums and code like this:

fn foo(input: u32, flag1: u32, flag2: u32) {
    match input {
        v if v == flag1 => { .. },
        v if v == flag2 => { .. },
        _ => { .. },
    }
}

flag1 and flag2 can be passed from somewhere far-away, and they can be equal, which will lead to second arm being effectively dead. (to make situation even more similar replace function arguments with const N: u32)

So I don’t believe that sum-enums introduce something radically new in this regard.

mcy · November 7, 2018, 9:05pm

I’ll point out that using ascription patterns is incompatible with how ascription patterns are used today. It is not clear that x: T asserts that x is a T or that x is the variant of type T. Moreover, the following code’s behavior has been left undefined:

fn foo<T, U>(x: enum(T, U)) {
  match x {
    t: T => panic!("T"),
    u: U => panic!("U"),
  }
}

foo::<i32, i32>(x)

The only solution to this problem is to somehow ban that monomorphization or to ban generic anonymous sums (or, to introduce where T != U clauses, which is overkill).

RustyYato · November 7, 2018, 9:10pm

~~This makes changing the order of match arms a foot-gun (possibly a breaking change), because it changes which arm gets executed, which could lead to wildly different behavior.~~

edit: I was wrong

skysch · November 7, 2018, 9:13pm

Ah, you’re right. The make fn is the thing that’s dropping the second argument, and it has to do that intentionally.

newpavlov · November 7, 2018, 9:14pm

Can you be more specific why it's incompatible? Because according to RFC 2522 it works exactly as was used in this proposal, i.e. x: T asserts that x has type T.

Well, actually I've also added that Err(err): Err(ParseIntError) is legal and it does not fit into generalized type ascription (without making enum a sugar as was describe that is), so I'll remove that part.

Have you read the proposal? I've allocated the whole section to discuss this problem.

How is it principally different from usual match or branching code dependent on constants or results from static methods?

skysch · November 7, 2018, 9:20pm

I think you hit a confusing problem if you call my consume fn with enums representing the error output of two different paths that happen to share a common value. Each path may associate a different semantic meaning to the common type, but consume can only see them as a single type, so maybe in that case you’d just prefer a regular enum.

Which leads me to the same problem I brought up in the other thread: where would this syntax be preferable to a named enum when both provide the same functionality? It seems safer just to use regular enums in this case.

RustyYato · November 7, 2018, 9:24pm

Yes, I agree with this, I don’t like how sum-enums play with generics.

mcy · November 7, 2018, 9:29pm

That is an extremely bizarre way of looking at sum types, because the following is thus legal:

match x {
  x: enum(A, B) => ..
  x: A => ..
}

But we cannot possibly have A <: enum(A, B), because you cannot reinterpret &A as &enum(A, B) (where does the discriminant live?), even though &T is covariant in T!

Sorry, I must have skipped it. That said, I do not think your solution is appropriate. This is very much a footgun when people proceed to use enum(A, B) as an either, which is what many, including myself, expect it to do!

I'll add that, while I like mathematically neat systems, professional experience has shown that taking footguns away from people trumps in this case.

eaglgenes101 · November 7, 2018, 9:31pm

The solution I could determine from my reading was that the first matching type arm according to the monomorphized types wins. Neatly resolves the problem of getting things to compile to code without causing confusing compile errors, but the footgun potential is real.

May the best RFC proposal win.

newpavlov · November 7, 2018, 9:41pm

See motivation examples, sometimes you can't create such enum (return several types under a single impl Trait), and sometimes you don't want to (short-lived error enums). So sum-enums in generic code should be pretty rare and must be viewed with a bit of suspicion, but I don't think that it's a reason to forbid them in generic contexts outright. Or create some ways to add bounds U != V. Especially considering that you already can write "footgunny" code like this today using branching on type static method results. (see the new example instead of the previous macro based one)

I am not sure that I follow you, the second arm will be a dead one. Rust does not forbid creation of dead match arms, but it can lint against them.

Of course you can't reinterpret &A as &enum(A, B), you can only convert it to enum(&A, B). (note that memory layouts of A and enum(A, B) are different, so you can't simply cast, you have to convert) Am I misunderstanding something?

If you have to distinguish between left and right even if they have the same type, you should not use sum-enums. Right tools for the job and all. You could even use wrappers enum(Left<A>, Right<B>), but I think it will be less ergonomic compared to the existing Either enum.

skysch · November 7, 2018, 9:42pm

Overall, I like this proposal. I think the normalization properties (enum(A, B) = enum(B,A) etc., ) provide some real potential to simplify some code, and that the ability to remove unnecessary semantic signaling (enum variant names) for impl Trait is valuable and clean.

Downside is footguns wrt generics; people might potentially want to forgo traditional enums when the variant actually matters. Returning enum(A, B) really asserts that A and B should be treated uniformly with all other As and Bs, even when your choice of A or B is meant to signal something to your user. This would be suspicious.

RustyYato · November 7, 2018, 9:54pm

I liked this proposal a bit more when I realized that it almost provides type level sets, but I don’t know if it is worth the cost of how it plays with generics. The other proposal could be extended to handle enum impl Trait in the future, and it plays with generics much better than this one, which I find important.

RustyYato · November 7, 2018, 9:55pm

But they are not just anonymous enums, because of the rules around sum-enum unification. The proposal that spawned this one is closer to anonymous enums than this one is.

skysch · November 7, 2018, 10:03pm

Name bikeshed: I like the name auto-enum for these, because of a double meaning: they are “automatically normalized and generated by inference”, and they are enums with “self-describing semantics.” (Conversely, I’m opposed to any sum-type jargon. Not specifically because it is jargon, but because avoiding the jargon permits future modifications which may contradict a strict type-theoretical notion of sum-type.)

felix.s · November 7, 2018, 10:20pm

I also oppose calling them 'sum-enums'; although it's because, appropriately (ironically?) enough, that is already something of a misuse of jargon. Regular enums are already sum types. If anything, these proposed types represent amalgamated sums.

I'm not fan of the 'pushout' term even in category theory; 'amalgamated' is too unwieldy; 'auto-enums' sounds too non-specific to me. Maybe '[anonymous] conflating enums'? (As in, variants with identically-typed payloads are conflated instead of distinguished from each other.)

Soni · November 7, 2018, 10:45pm

Hmm.

What if you add pattern matching to generics:

fn foo<enum(A, B)>() -> enum(A, B)

Make them not a “real” type.

scottmcm · November 7, 2018, 11:40pm

I really like the collapsing behaviour, here. It makes total sense to me that (io::Error|io::Error) is the same as just io::Error – I’d be very glad to not have to handle both Left(io::Error) and Right(io::Error), since I can’t imagine I’d care.

I was also thinking that this would be quite elegant:

fn min_max<T, U>(a: T, b: U) -> ((T, U) | (U, T));

And it means that if you do call it with T == U, you just get a tuple out that you can irrefutably match immediately.

160R · November 8, 2018, 12:11am

Could we reuse become reserved keyword here?

On named type:

    become Foo { 
        u32, 
        u64 
    }

    fn foo() -> Foo {
        if condition() {
            1u32.into()
        } else {
            1u64.into()
        }
    }

On anonymous type:

    fn foo(val: Weak<str>) -> Result<i64, become { NoneError, ParseIntError }> {
        let strref = Weak::upgrade(val)? 
        let num = i64::from_str_radix(strref, 10_u32)?
        Ok(num)
    }

On anonymous impl:

    // output of this function is a sum over 2 anonymous `impl Trait` types,
    // variants are automatically collected by compiler
    fn foo<'a>(data: &'a [u32], f: bool) -> become impl Iterator<Item=u32> + 'a {
        if f {
            data.iter().map(|x| 2*x).into()
        } else {
            data.iter().map(|x| x + 2).into()
        }
    }

Topic		Replies	Views
Pre-RFC: type-level sets	18	2132	May 1, 2019
pre-RFC: anonymous enums language design	13	5533	March 25, 2019
Pre-RFC: Anonymous variant types language design	93	6164	March 25, 2019
[Pre-RFC] sum/union types language design	9	1142	May 13, 2023
Sum types in product types language design	27	1223	February 11, 2024