Ideas around anonymous enum types

My first argument is it is consistent, rust is strongly typed. (A, B) is not the same as (B, A). I find the consistency useful.

My second argument is that knowing the order is required for indexed-based matching. If you have (A | B) ref.0 refers to A and if you have (B | A) ref.0 would refer to B.

My third argument is this makes the generic cases consistent. consider

let anon_enum : (T | u8) = ...;
match anon_enum { // match anon_enum {
    t : T => (),  //     ref.0(t) => (),
    u : u8 => (), //     ref.1(u) => (),
}                 // }

Now if anon_enum is T we would expect it to use the first branch, and if anon_enum is u8 we would expect it to use the second branch. But what if T is u8?

With an order dependent anonymous enum this is resolved. If anon_enum was set through T it would be in the first variant, and if anon_enum was set through u8 it would be in the second. The behavior is consistent, regardless of the concrete type of T.

However an order independent anonymous enum would have inconsistent behavior here. First of all it couldn't be indexed matched, it would only be type matched, and it would have to use the concrete types. So if anon_enum is T we should expect it to match on the first branch. but if T happens to be u8, well what branch do you run? If anon_enum is u8 do you run the u8 or T branch? if anon_enum is T do you run the u8 or T branch? Do you convert your u8 into T or your T into u8? You would be forced to change the type one way or another.

With order dependent anonymous enums I know that T will always match on the T branch and u8 will always match on the u8 branch, regardless of what T is. With order independent, T may match on the T branch and u8 may match on the u8 branch, but they may also match on the other branch. u8 may become T or T may become u8.

Personally I am against proposals which do not normalize anonymous enums and your arguments in the begging seem weak to me. I also think that matching of anonymous enums based on type ascription is really nice and natural (although there are some minor conflicts with parsing, so in the end we may need the |: syntax).

Regarding terminology, I think anonymous enums should be considered "true" sum types. enum in its current form essentially creates a new type for each variant, so you can consider:

enum Foo {
    A(u32),
    B(u32),
}
// to be sugar for
struct A(u32);
struct B(u32);
type Foo = A | B;

This view also supports normalization, since type Foo is not dependent on order of arms A and B.

It would follow the first arm as was said many times in this thread. And this behavior will be consistent with how match works today. I really don't see how it can create problems in practice. We may add a (clippy?) lint which would ask to put more specialized cases first, but even without it I think such feature will work just fine.

3 Likes

Consider the folloing

trait TraitA {...};
trait TraitB {...};

let anon_enum : (A : TraitA | B : TraitB) = ...;
match anon_enum {
   a : A => ...,
   b : B => ...,
}

Both arms are generic, if A and B happen to be the same type, which one should lose their trait methods and adopt the trait methods of the other. Obviously it will be the one you put first. But why should we force one into the other, when we could know if anon_enum was A or B and maintain the trait relation.

With coercion and type matching we can provide a similar interface that a flattening, order independent enum would have, but give the user control, if they want A and B to be separate and consistent across implementations, thats the default behavior, if they want to have a special case that converts one to the other, then they can specify it.

1 Like

Could you please provide one concrete use-case. I feel stupid but I definitively fail to find one.

1 Like

In my practice I never wanted to write code like in your example and AFAIK none of the earlier proposals were suggesting anything like that (UPD: I misread your example like impl TraitA). Usually you either want for all types in anonymous enum to implement a single trait (e.g. Error) or you are interested in a concrete type after matching on such enum.

type HttpError = anyhow::Error<String>;
type SqlError = anyhow::Error<String>;
let result : (HttpError | SqlError) = ...;
match result {
    _ : HttpError  => println!("Oh no the http request failed better run http repair logic"),
    _ : SqlError => println!("Oh no the sql statement failed, better run sql repair logic"),
}

I used type defs, but it is possible result is (anyhow::Error<String> | anyhow::Error<String>).

That's my bad I got my terminology backwards I shouldn't be posting when I should be asleep

This, unfortunately, is a wrong interpretation of enum. Foo::A is not a type. It'd be nice if it was, but it isn't.

I meant "you could consider it from theoretical PoV", of course it does not work literally that way right now and it can't in principle without "true" type summation.

There are proposals like RFC 2593 that would make them types.

1 Like

To me, having anonymous enums be positional and non-normalized would remove almost all of the potential ergonomics. If a function I call can return IOError | ParseError I don't care which one is first. I don't (and shouldn't have to) care that internally 3 nested functions are called and there are actually 3 sources of IOError and 2 of ParseError. It would seem to me that functions would then have to manually flatten the return values from nested function calls to hide internals.

But (A, B) is a product type, and (u8, u8) makes perfect sense and has 16 bits of total value space. Whereas A | B is a sum type like enum, and even with 2 u8 entries it only has 9 bits of value space. AFAIK you don't get a guarantee about regular enum discriminant ordering, either, unless its a fieldless enum (?).

This to me is circular reasoning. Indexed-based matching shouldn't be necessary in the first place.

In this example HttpError and SqlError are the same type (!). This would make more sense to me if newtypes were used instead.

Which brings me to: How would you even create a value for a non-normalized enum.? E.g., in a fn foo() -> u8 | u8, where both u8 are distinct, how would you assign a value or write a return statement?

I am confused: how is this different from Option<Option<T>> which AFAIK will combine the discriminants into a single discriminant?

6 Likes

It doesn't (to a first approximation).

I can turn &Option<Option<T>> into Option<&Option<T>> via Option::as_ref. The inner option is present and the discriminants can't be naïvely merged.

As the author of the offhandedly mentioned Anonymous variant types RFC, I researched past proposals to determine where they fell short, and some of these shortfalls are being repeated as ideas in this thread.

Union types and implicit type flattening may be convenient in the simple case, but trying to specify their interactions with the rest of the type system quickly reveals runtime behavior pitfalls, a combinatorial explosion of cases, or additional new type system features that have to be used whenever values of those types are matched against. None of which are desirable.

And as for syntax which I chose, (i32 | f32)::0 for the first variant might not look nice, but it is consistent, and having the type be a sum type makes it easier for macros to work with, so this issue should be mitigated with the help of the crate ecosystem.

And as for "why not an enum if you want an actual sum type"? In short, for the same reason as why tuples exist despite the presence of structs as a product type. The Either crate has already been brought up, and it definitely sees usage, but there's also an independent implementation of Either in the Futures crate. Despite being almost identical in nature and purpose, the two are separate types. The two types are in essence ersatz anonymous sum types of two variants, and if a proper anonymous sum type were available, they both could use it.

4 Likes

It certainly wouldn't be doable in a naive way in general, but FWIW according to this old post they are merged for nested Option at least:

PRESCRIPT

Hello if you are new to this thread, the broad idea currently being discussed is on post 36. Can't believe we are already on 54, but I don't want to repeat examples and logic too often.

Common disccussion tends to revolve around whether anonymous enums should be Order Independent and implicitly flattened, or if they should be Order dependent.

ok on to the reply.

Script

Not necessarily manually, I imagine most cases the enum would be coerced to the function return type, which you would probably write in a flat representation anyways. In the following example assume a is of type ((A | C) | (A | B)

fn f() -> (A | B | C) {
   let a = ...;
   return a; // a is being coerced into  (A | B | C) because thats the return type
}

You can also layer type based matching on top of indexed based matching.

match f() {
   a : A => ...,
   b : B => ...,
   c : C => ...,
}

In most cases I doubt the user would even need to care about the order. Because in most cases you would probably write (A | B | C) as a return type, and use type matching sugar. I think you can achieve most of the ergonomics with indexed-based matching, coercion, and sugar.


That was the point of the example. Two variants of the same type can hold different meaning, which type based matching (which needs to use the concrete type) couldn't differentiate, where as indexed based matching can.

As for the Errors being new types, you don't always get to control that, the function you called could return a error from library1 and another from library2 and both could happen to anyhow::Error.

The summary in post 36 (wow we are already on 54 I just wrote that summary) provides an example of how a (u8 | u8) could be created through generics and how index based matching makes this deterministic.


I just want to emphasize this point eaglegenes is making. Anonymous Enums are not a new concept and a lot of thought has been put into their possible implementation. Implicit flattening and order independence have always been considered but In my opinion the idea that these are requirements for anonymous enums is what has caused this concept to stall for years.


Seeing this RFC and this suggestion gave me an idea. Lets say matching used the above style

let anon_enum : (A | B | C);
match anon_enum {
   0(a) => ...,
   1(b) => ...,
   2(c) => ...,
} 

we could use a similar syntax for setting ambiguous assignments

let _ : (A | A) = a; // error
let _ : (A | A) = 0(a);
let _ : (A | A) = 1(a);

I don't know if there would be a large use case for making these assignments. But if (A | A) has to be legal due to generics, this might be a way of manually setting the enum without having to jump through the generic hoops.

Since it hasn't been mentioned before, I think if Rust adds anonymous enums, they should be similar to Typescript union types.

For those who aren't familiar with it: Typescript treats types as sets. A type A | B is the union A ∪ B. This means that:

  • A | A is equivalent to A
  • A | B is equivalent to A, if B is a subtype (i.e. subset) of A
    • for example, (A | never) is equivalent to A
  • A | B is equivalent to B | A
  • A | (B | C) and (A | B) | C and A | B | C are equivalent
  • A can be coerced to A | B
  • A | B can be coerced to A & B (this is the intersection A ∩ B)
    • in Rust terms: A | B can be coerced to impl Foo, if A: Foo and B: Foo

In practice, this is very powerful and quite easy and intuitive to work with. There are many use cases for this, not just error handling.

Unfortunately, automatically implementing traits for anonymous enums can be unsound, as I pointed out before. I hope that this issue can be solved though.

I don't think it's necessary to be able to distinguish A and (A | A). If you do, you're better suited with a normal enum with named variants. Or you can use tuple structs to distinguish them:

struct X(i32);
struct Y(i32);

fn xy() -> (X | Y) {...}

match xy() {
    X(_) => todo!(),
    Y(_) => todo!(),
}
3 Likes

Here's the disconnect as I understand it:

If you want anonymous union types for the explicit purpose of

  • an enum impl Trait feature,
  • "checked exception" like error handling (the only context of where the error came from is the type), or
  • (localized) dynamic typing (within a "small" set of types)

You'll really want the flattening/canonicalizing/deduplicating behavior. After all,

  • the only relevant information is what the concrete type is, and
  • who cares who set it to the type from where, each concrete type should always be treated the same, as
  • the purpose is explicitly being able to specialize based on which type of the union it is.

The main issues as I understand it with this are:

  • Rust is not a dynamic language, and this would be adding a dynamic feature to it.
  • Destructuring slipping from one arm to another based on generic instantiation is not "hygenic", in a strongly typed and hygenic type system.
  • A lot of subtle things have to change about how types are reasoned about to implement real anonymous union types, with tons of edge cases for the one feature.

As opposed to just having anonymous sum types, where

struct AB {
    a: A,
    b: B,
}

is to

(A, B)
// roughly
struct _ {
    0: A,
    1: B,
}

as

enum AB {
    A(A),
    B(B),
}

is to

(A | B)
// roughly
enum _ {
    0(A),
    1(B),
}

Those in favor of "anonymous sum types" are in favor of "anonymous enum"; those in favor of "anonymous union types" are in favor of an entirely new language feature.


To be clear, Typescript does this literally for free because every variable is roughly GcPtr<dyn Any>, and type unions are done via downcasting the type-erased pointers that the runtime works with. This makes sense for Typescript's runtime and the way that the JS runtime uses duck typing.

Rust doesn't. What you want really is dyn Any of some small set of known types which you can downcast to. i.e. enum impl Any. It doesn't make sense for "anonymous enum" to be enum impl Any.

(To be clear, though, "enum impl Any" would be more efficient to not actually use Any downcasting but instead discriminants for the known types.)

11 Likes

I've "liked" both of the immediately-preceding two posts because, for me, they elucidate the dichotomy that has been part of this and predecessor threads on this topic.

One subject that seems to have been avoided – or perhaps I just missed it – is discussion of the teachability of "anonymous union types" vs "anonymous sum types" (see prior post), together with the likely impact that choice will have on the learnability of Rust.

Separate from the learnability issue, how much bad-mouthing is Rust likely to receive in the technical press for making either choice, and thus complexifying the language for new Rustaceans? Will either choice be likely to impact Rust's "most loved language" status in SO's ranking for the year(s) after feature introduction?

3 Likes

Can you help me understand what "dynamic" means here? Are you saying anonymous union types require dynamic/runtime support, and if so, what exactly does that mean in this context?

Dynamic here refers to dynamic the way Any is dynamic.

Up to current, in (stable) Rust, generics are effectively a black box. Whatever type is in it, it doesn't matter, as it's in the T box and you can't get it out or put something else in.

Due in no small part to this, while Rust's generics are monomorphized (for performance and abstraction stripping reasons), they could just as easily be polymorphic (modulo dynamically sized locals/alloca being hard). It's in that polymorphic view of things, where there isn't a copy-pasted version of the function for each input type, where a true union type's dynamicism is most obvious.

Consider again or_0u8.

fn or_u8<T>(t: T) -> ( T | u8 ) {
    if random() { t } else { 0u8 }
}

If we "template" monomorphize this, deduplication is obvious:

fn or_u8(t: u8) -> ( u8 | ) {
    if random() { t } else { 0u8 }
}

But if we polymorphize it, we have to dynamically check what the type is. In pseudo-Rust:

fn or_u8<dyn T: Any>(t: T) -> If<{T is u8}, (u8|), (T|u8)> {
    match T::downcast::<u8>(t) {
        Ok(t) => if random() { t as (u8|) } else { 0u8 as (u8|) },
        Err(t) => if random() { t as (T|u8) } else { 0u8 as (T|u8) }
    }.into()
}

It's specifically that union types see into a historically opaque generic and can alter behavior based on its concrete type fairly arbitrarily (breaking polymorhization) that I say proper union types are a dynamic feature.

† Specialization makes things a bit murkier. But recently the scope of specialization has been pulled back massively for "minimal specialization" which still fits the polymorhization approach, specifically because of how it's restricted and entirely based on clearly subsetted trait impls.


Also, for those who want union types, how do you handle (&'a T | &'b T)? As far as rustc's middle is concerned, those two types stop being different after borrowck (i.e. functions aren't monomorphized for every single different lifetime they get) and there is no "lifetime equality" check possible between disparate lifetimes.

7 Likes

While much more involved for sure, I was always under the assumption that union types would take the "template monomorphization" approach as you describe it, and never considered a polymorphic version. For sure I wouldn't want a dynamic implementation in Rust, either! I still don't see why it would be required, though.

Regarding (&'a T | &'b T), that is of course the sticky question that needs to be answered. I wonder if a simple intersection wouldn't be a sufficient enough approximation. Even if this, or some other simple approach, results in a false lifetime error, I'd think this should be a rare enough case that requiring the user to resort to full enums to resolve it (with an explanation, of course) is fine.

There is actually another issue/quirk with union types that I'll throw out: that of structural composition. E.g., in a straightforward approach Result<(A | B), (X | Y)> results in Ok(A | B) | Err(X | Y) rather than the probably more useful Ok(A) | Ok(B) | Err(X) | Err(Y).

On a bikeshedding note: If anonymous enums (rather than unions) become a thing I'd much rather prefer the enum { u8, u16, &'static str } syntax proposed elsewhere rather than (u8 | u16 | &'static str) as this would IMHO much better capture the nature of the implementation.

Still not a fan, though.

3 Likes