Pre-RFC: sum-enums

Very in favour of this. It seems ridiculous that we have anonymous product types (tuples) but not anonymous sum times. There have been many, many situations in which having such a feature would improve usability.

That said, I have serious disagreements with the match syntax proposal. It means you get problems when matching on an enum that looks like enum(T, T) in generic code. It also makes the subtle but problematic assumption that the only distinguishing meaning of a field is its type (tuples set the precedent that their position in the tuple holds meaning too: we can declare a (i32, i32) and have each field mean a different thing). Instead, I believe we should introduce some sort of match syntax that respects the placement in the type declaration of the value, similar to the precedent set by tuples. Something like this perhaps:

let e: enum(i32, bool) = enum(true);

match e {
    enum(!, x) => println!("It's a bool, {}", x),
    enum(y, !) => println!("It's an i32, {}", y),
}

This would remove the match ambiguity problem.

2 Likes

Tuples aren't setting a precedent here, enums are. Enum variant specification order has no impact on the behavior of enums, so it shouldn't impact this either.

5 Likes

Considering sum type nature of sum-enums, they have the following properties: [ā€¦] enum(A, B, A) is equivalent to enum(A, B)

That is then definitely not a sum type (with all their implementation and conceptual problems). I think calling it as such is even more confusing than a "union type" (because those who would confuse them probably don't even know about unions in Rust, which are most useful for FFI purposes). I'd prefer something like "normalized variant" or "normalized alternative" types for avoiding confusion with unions and real sum types (enums).

into is required for conversion into implicitly constructed sum-enum [ā€¦] The main way to create sum-enums will be to use Into trait implementation.

If we are adding a core language feature, I'd opt for making this a coercion instead, guided by type inference. This would have two advantages:

  • It seems to me that you want to make the enum-like nature of these types unimportant, or treat them like so (because you mentioned that enum(A) == A, unlike a single-variant named enum). This means that it is semantically redundant to have to use .into() because the fact that there's a discriminant and enum-like behavior is basically an implementation detail at this point (especially when it's in a "return impl Trait context). Just like you (hopefully) don't write fn foo<T>(x: T) -> T { x.into() }, and just return x directly.
  • Automatically implementing Into would presumably require compiler support, either via making Into or From lang items, or otherwise baking them into the language. The beauty of these two conversion traits is that they don't require this. This is a pretty big conceptual change and a point of no return, so I'd be wary of committing to it.

In cases when ordering of type arguments is required it can be done based on TypeId of each type. Alternatively for tag we could use TypeId directly

Doesn't TypeId have a non-uniqueness bug? I mean, I get that it's a hash, but last time I read about it, IIRC some of the compiler/runtime developers warned that it has a way higher than negligible/acceptable chance of collision for realistically complex types. If this is the case, using it as a discriminant or an ordering key can cause unsoundness and memory unsafety bugs, because it would result in accidentally interpreting values as the wrong type.


Others have already mentioned the problematic parts of the interaction with generics (which I mostly agree with), so I won't reiterate them here.


Unresolved questions

Interaction with lifetimes. To start things off we could restrict sum-enum usage only with types which ascribe to 'static

I might be naive, but isn't the lifetime of a union of types simply the intersection of the lifetimes of its components?


impl Trait variations: some have proposed to use enum impl Trait

IMO enum impl Trait is just noise (actually, worse, the leaking of an implementation detail).


Syntax bikeshed: enum(T, U) is meh at best, because that looks like a tuple. I think Centril proposed an alternative syntax in the other thread T | U which is nicer to read, as it is evocative of semantics beacuse it denotes the "or" logical operation (corresponding to set union). It also gets rid of a pair of parenthesis and the keyword, but that's a minor point.

2 Likes

To clarify, the usual formulation of the enum impl Trait proposal is that the enum (or other marker) would not be a "real" part of the public API signature, i.e. it wouldn't be rendered by rustdoc, wouldn't affect the function's type, can't be depended on by clients in any way, etc. Same as how mut on function arguments is not a "real" part of the signature today.

2 Likes

In my (not the best) understanding sum type is a disjoint union of types, thus A āˆŖ B āˆŖ A = A āˆŖ B. But I agree it's not the best name (see "about terminology" section). I was thinking about using "set" somehow. Maybe something like "types set"?

Good points. One option is to use keyword for explicit conversion point, lets reuse enum keyword for now, then code will look like this:

fn foo(val: Weak<str>) -> Result<i64, enum(NoneError, ParseIntError)> {
    let strref = Weak::upgrade(val).map_err(|e| enum e)?;
    let num = i64::from_str_radix(strref, 10_u32).map_err(|e| enum e)?;
    Ok(num)
}

fn foo<'a>(data: &'a [u32], f: bool) -> impl Iterator<Item=u32> + 'a {
    if f {
        // note that `enum` just converts value to the inferred sum-enum
        // and does not return it
        let iter = enum data.iter().map(|x| 2*x);
        return iter;
    } else {
        enum data.iter().map(|x| x + 2)
    }
}

Code becomes less implicit, and such keyword will solve problem with casting subset into a full set. But I don't quite like that you'll have to use map_err for multiple errors use-case. I guess we could make the following code legal:

// works because enum(u32) == u32
fn foo() -> u32 { enum 1u32 }

And desugaring ? with unconditionally added enum.

If we'll truncate TypeId to u8/u16, then we'll have to deal with collisions wither way. The reason for utilizing TypeId is to make easier conversions between different sum-enums for most cases.

I think we should treat &'a [u8] and &'b [u8] as effectively different types in this context, i.e. enum(u32, &'a [u8], &'b [u8]) will not be unified into enum(u32, &[u8]), but I am not sure if it's a good approach.

I completely agree with you here.

The motivation was to reuse intuition around matching enums and existing keyword. T | U is indeed a nicer syntax and makes it natural that (A | B) | C == A | B | C == B | A | C and A | ! == A, so I am not against replacing enum(..) with it.

2 Likes

To clarify, I didn't mean that. At least I interpreted the other alternative in the following way: we either use TypeId and trust that it's reasonably collision-free, xor we use a tag modeled after today's enums, whereby it's mostly just increasing (or at least provably unique) integers in the smallest possible unsigned type, and in that case there would be no direct correspondence between component TypeIds and variant tags.

Converting the expressions into a union type by re-using the enum keyword (for now) looks good to me. Agreed that implicitness could be a surprise.

Of course, but how/why does that contradict what I wrote?

I suggested that union of &'a T and &'b T could be &'c T where 'c is the intersection (overlap) of the regions 'a and 'b (i.e. 'a: 'c, 'b: 'c if you will, but with the additional constraint that it's the longest possible such lifetime, for obvious convenience reasons).

I think that should be legal by default anyway. Why would it be disallowed if enum(T) == T?

1 Like

Iā€™ve read the OP and looked at the examples, how does this solve the second problem? I donā€™t actually see where two types are being returned? Also union-type-enums is a wayyyy better name than sum enums, I know you were worried about overloading on ā€œunionā€ but when someone talks about unions in this way I donā€™t think of union struct concept, especially when the word type is involved. When I saw sum enums, I thought of arithmetic sums, not type unions.

In this code:

    if f {
        // `into` is required for conversion into implicitly constructed
        // sum-enum
        data.iter().map(|x| 2*x).into()
    } else {
        data.iter().map(|x| x + 2).into()
    }

The map methods return different types because each closure has a unique type, and thus they cannot be unified for impl Iterator, which must denote a single type. This proposal solves this by allowing type inference to recognize this situation and implicitly construct an enum(A, B) which implements Iterator and return that instead. (This is signaled with the into method calls.)

The word sum is actually pretty standard terminology for this kind of thing. If you think of tuples as a Cartesian product (not an arithmetical product), then you can see (EDIT: tagged) union types as a kind of "Cartesian sum" by analogy. In this more general setting, there are actual laws that product and sum types obey which are closely analogous to the laws of arithmetic, so it is not an unmotivated analogy.

1 Like

However, sum types don't permit an overlap. T + T is not the same as T union T; the former is what you get from a 2-variant enum each having associated data of type T today; the latter would be just T itself under this proposal.

3 Likes

That you can write match x { x: enum(A|B) => ..} and match x { x: A => ..} in different places, and have the arms evaluate, implies that x <: enum(A|B) and enum <: A. If you unify, you expect the subtype relationship A <: enum(A|B). Because &_ is covariant, you get the conclusion that &A <: &enum(A|B), which is a contradiction with Rust's guarantees (i.e., if A <: B, then &A can be freely converted to &B in-place).

enum K { A(A), B(B), } behaves like a sum. enum(A, B) behaves like a fibered sum (i.e., a sum with collapse of equal factors). This violates the principle of least surprise, and introduces the footgun I described. Calling enum(A, B) a sum is definitionally wrong.

5 Likes

The type system cannot do this because the actual values of lifetimes are not part of the type system. To my understanding, at the point when the borrow checker actually runs on the code to assign these values, all types are already specified well enough that you can perform monomorphization.

I think this is backwards. (it says that if 'a with 'static, then 'c is 'static)


My general gut reaction is that I prefer the positional variant. Type-directed lookup has waaaaay too much room for bikeshedding.

The commutativity and especially the way that multiple types can absorb into one feel extremely treacherous conceptually. I see a lot of reasoning based on equality of types, but type inference doesn't use equality; it performs unification.

Picture trying to unify enum(Vec<T> | Vec<i32>) with enum(Vec<_> | _)...

5 Likes

Indeed, sorry. So the correct notation would have been 'a: 'c, 'b: 'c.

I like this pre-RFC much more than the RFC. One thing that makes me a bit uncomfortable is that the order of types in sum-enums does not matter, making it a potentially very large breaking change to refactor a sum-enum out of a function into a proper enum.

One way to avoid this could be to just state that sum-enums cannot contain the same type twice, but that the order of the types relatively to one another matters. I donā€™t know if this is a good idea or not, given that one way to refactor a sum enum out of a function is to just write a pub enum(...);.

This is unacceptable because then uou can't do anything with generics.

2 Likes

you mean something like

fn foo<enum (A, B)>(a: A, b: B) -> enum (A, B) {
if A::typecmp<B>().is_gt() {
return a;
} else {
return b;
}
}

?

But what if I called this

foo(1i32, 2i32)

What would the error message be. What if this generic was nested under 5 layers of other functions, far removed from me?

itā€™s like calling it foo::<enum(i32, i32)> and expecting enum(i32) to pattern-match enum(A, B) - itā€™s a type error.

if you call foo with generics your generics must also be enum(A, B)

it desugars to fn foo<A, B>() where enum(A, B)

you can also do fn foo<A>() where enum(A, MyThing)

etc

fn bar<T>(t: T, i: i32) {
    foo(t, i);
}

How about this? Say this is in some crate that Iā€™m using, and i try to call this like

bar(0i32);

What would the error message be?

1 Like
fn bar<T>(t: T, i: i32) {
    foo(t, i); // ERROR: T isn't bound to enum(T, i32)
    // help: add `where enum(T, i32)` to your function:
    // fn bar<T>(t: T, i: i32) where enum(T, i32) {
}

T isnā€™t bound to enum(T, i32) (in the crate, not in your code)

thus there is no bar because it doesnā€™t compile

So is enum(...) now a trait bound?