Ideas around anonymous enum types

EDIT: the following posts are the most notable one (click on them, reading them is worth your time)

END OF THE SUM-UP


When reading the thread about go generics, I had an idea that I think is worth sharing.

Let's assume that

  • the syntax (i32 | &str) would create an anonymous enum with two variants (i32 and &str).
  • anonymous enum automatically implements all traits implemented by all the variants.
  • if and match expression would be extended to be able to return value of different types for each branch. The type of this expression would be either an anonymous enum (or a regular enum iif the enum implements the From trait for each possible branch).
  • for can be used as an expression, and can iterates on tuples.
    • Inside the loop, the type of the value would be an anonymous enum with all the types of the tuple.
    • If all values generated by the loop share the same type, the value created by this statement would be a [T] or, if the size is known (no break/continue/…) to a [T; n].
    • If the types aren't the same, if could be collected as a [(A | B | C | ...)] with A, B, C, … the types of the elements or if the size is known in a tuple (A, B, C).
    • Of course iterator would be extended to support the same functionality ((1f, 2u, "three").iter() would implement Iterator<Item=(f32 | usize | &str)>).

This open some interesting design ideas:

First, let's play with if block.

// explicitly create an anonymous enum
let x: (i32| &str) = 3;

let cond = false;
let y: (i32| &str) = if cond {
    3
} else {
    "three"
}
assert_eq!(x, y);

let y = if cond {
    3
} else {
    "three"
} : (i32| &str); // using type ascription since `3` is ambiguous
assert_eq!(x, y);

let y = if cond {
    3i
} else {
    "three"
};
assert_eq!(x, y);

Now, with loops:

let tuple = (1.0f, 2u, "three");
let (x, y, z) = for value in tuple {
    // value's type is `(f32 | usize | &str)` so value implements `Debug`
    printf("{}", value); // value of heterogeneous types can be used as long as they share some type in common, but only through those traits
    value
}; // implicit `.collect()` at the end of a for expression
// x is a f32, y an usize, and z a str
assert_eq!(tuple, (x, y, z));

and finally, let's play with functions:

trait SuperTrait {}
trait SomeTrait: SuperTrait {}

fn test<T: SomeTrait>(value: T) -> bool;

struct A;
struct B;
struct C;
impl SomeTrait for A {}
impl SomeTrait for B {}
impl SomeTrait for C {}

fn foo(values: (A, B, C) -> impl SuperTrait {
    for value in values {
        // `values` is a tuple of type `(A, B, C)` so the type of `value` is
        // the anonymous enum `(A | B | C)` which implements `SomeTrait`.
        if test(value) {
            return value; // the returned type can be any of `A`, `B` or `C`
        }
    }
}
// the concrete type of `foo` is `(A | B | C)` which implements `SomeTrait`
// and `SuperTrait` (since `SuperTrait` is required by `SomeTrait`).

I think it's really nice how everything could fit together. All the types would be statically typed, but it really feels that I was using some dynamically typed language.

5 Likes

The usual stumbling block for anonymous enum proposals is that nobody can seem to come up with a good syntax for matching on the variants. But using anonymous enums solely for their trait impls completely bypasses that problem, so that is an interesting idea.

As you may already know, I've wanted enum impl Trait return types for a long time. That's usually considered separate from anonymous enums because enum impl Trait wouldn't require you to spell out the variants anywhere or allow you to match on them, but it looks like you've essentially merged the two ideas. In past discussions everyone seemed to agree that just impl Trait shouldn't autogenerate an enum and we wanted some kind of explicit opt-in marker like the enum keyword, but I don't recall any knockout argument for that so maybe I could be talked out of it.

My main knee-jerk concerns are that:

  • I'd expect this to wreak havoc with type inference, but we'll need someone far better informed than me to comment on how big of a problem that really is
  • If we're only using trait impls and not proposing a match syntax, then it's unclear how many of the use cases for typical anonymous enum proposals are covered, so this might be little more than two alternate syntaxes for enum impl Trait.
2 Likes

One option is to built on top of generalized type ascription to solve it:

I would love something like that. Currently writing generic numeric code in Rust is quite tricky and verbose, and most of the time it just needs (f32 | f64).

7 Likes

And we back to Simplify error handling proposal where I have shown even prototype https://crates.io/crates/ferris-extensions :wink:

All this discussions show the need of community, I hope it will be dragged through finally )))

Also I like the idea in Pre-RFC with Pre-RFC: sum-enums enum keyword

enum(NoneError, ParseIntError)
5 Likes

To me, at least, there's only one syntax I've ever seen brought up that isn't horrible:

Given that, the stumbling block is instead arguments over whether enum(i32 | i32 | i32) and enum(i32 | i32) are the same type, and corresponding specialization-like questions around whether enum(&'a i32 | &'b i32) can be a legal type.

And don't forget everyone's favorite of arguing that enum(A | B | C) and enum(A | enum(B | C)) should flatten and be equivalent

or less controversially, enum(A | B) vs enum(B | A)

There's lots of annoying little corners to specify that people disagree on what's the "obvious" semantics.

8 Likes

This looks like a very interesting take on the problem as it addresses several issues with the same conceptually simple solution: sum union types with set semantics. The argument against implicit type widening seems weak to me because the very same bug exists without sum types already.

I’d be really interested in helping push this forward — where shall I start?

EDIT: by reading the other linked threads it became clear that the terminology here says “union type” where I said “sum type”, apologies for the mixup.

That can't possibly work – it would require making every trait into a lang item. This is impossible for 3rd-party traits and such special-casing is still highly undesirable for all core/std traits.

Please just don't. That's basically breaking type checking and makes Rust weakly-typed, while the From thing is just too magical for no good reason.

Just… why? What's the motivation that would warrant such a radical change? A lot of things can go wrong if we gradually start allowing all sorts of crazy things.

The restrictions Rust imposes are not mere accidents or oversights — they are in place to protect programmers from their own mistakes. Stuffing the language full of footguns like this, just because "they open up interesting ideas" really strikes the wrong balance. The bar for introducing additional complexity to the language should be much, much higher than "I would find this useful in some cases". Otherwise the language would end up being a complete mess.

15 Likes

That would be unsound. For example, types that implement the unsafe Pod trait mustn't have padding bytes. (u8 | u64) would automatically implement Pod, even though it can have padding.

Even if we excluded unsafe traits, I doubt that it would be sound. However, I could imagine a concept similar to auto traits. Let's call them composable traits for now:

composable trait Foo {}
impl Foo for i32 {}
impl Foo for &str {}

// (i32 | &str) implements Foo

Unlike auto traits, composable traits may have methods and associated types and constants. Note that neither auto traits nor composable traits need to be lang items @H2CO3.

I agree with @H2CO3 that this is a pretty radical change that would require a separate RFC. When you write an RFC for anonymous enums, you can add it to the "Future possibilities" section.

I don't think that's a good idea, for two reasons. First, it breaks type inference:

if foo {
    0usize
} else {
    1 // this should be an usize
}

Second, it is implicit and can cause confusion, for example:

fn return_iter() -> impl Iterator<Item = i32> {
    if foo {
        vec![1, 2, 3].into_iter()
    } else {
        std::iter::once(42)
    }
}

Assuming that Iterator is composable, this would compile fine. However, it is not at all obvious that it returns an anonymous enum, since the branches have different types.

I think it would be better to make the coercion explicit:

fn return_iter() -> impl Iterator<Item = i32> {
    if foo {
        vec![1, 2, 3].into_iter() as (IntoIter | _)
    } else {
        std::iter::once(42) as (Once | _)
    }
}

FYI f and u aren't valid suffixes, you need to write 1f32 and 2usize.

7 Likes

Or you could just use something like Either and implement your trait for it (perhaps even with a proc macro, I don’t know if such a macro exists yet).

Sidenote: Coming from Haskell, I’m a bit sad that Rust doesn’t have this type in the standard library.

2 Likes

How about this: type as pattern matching

1 Like

The main desire behind anonymous enum is that you don't want to worry about the precise definition and the exact type. Thus I wonder if this could be modeled with existential types in mind for the type itself but require a precise type for the purpose of matching, which arguably is concerned with the representation.

Create

  • A new lang_item trait: trait Enum<OneOfThisTuple>, see below. This marker trait denotes the set of enums that have variants for exactly each of the types in the tuple (potentially multiple variants and each type might be mentioned multiple times).
  • A new lang_item struct: struct Anonymous<OneOfThisTuple>. These represent the canonical form of the above trait. They can be coerced into larger ones, as the major selling point of having them be a lang item. Also from marked user-defined enums, see below.
  • Possibly some shorthand to make it easer to write the type: let x: (i32 | &str) = 3.
  • Let fitting enums declare that they are a representation for such an anonymous enum:
    enum Either {
        Left(i32),
        Right(u32),
    }
    // Like a Marker impl, compiler checks variants.
    impl Enum<(i32, u32)> for Either {}
    // Now this coercion works:
    Either::Right(0) as (i32 | u32)
    

This solves the matching problem, fundamentally by offloading it to the user. Any match must first coerce it into a concrete enum defined the usual way. This process can be defined to take the first variant with fitting type in declaration ordern, this being an intrinsic. This solves the problem of Enum<(T, U)> potentially decaying to Enum<(T,)> when T == U in that both would be converted into the first variant. Matching also simply uses the named variants of the proper enum, no special syntax needed. Local type definitions will likely help make this useful at the site of use.

fn duck(val: (i32 | u32)) {
    Repr { A(i32), B(u32) }
    impl Enum<(i32, u32)> for Repr {}

    match val as Repr {
        Repr::A(_) => {},
        Repr::B(_) => {},
    }
}

On the note of automatic trait impls, this could be restricted to object safe traits as those can be made to work by matching on all variant with fitting ref/ref mut qualifier and relying on coercion.

let disp: &fmt::Display = match val as Repr {
    Repr::A(ref v) => v,
    Repr::B(ref v) => v,
};
3 Likes

I would like to see anon-enums added to rust as I think they would be very useful for creating better error handling code.

Personally I see Anon-enums to be a lot like tuples and could take a page from their book.

let anon_enum : (u32 | i32) = 5u32;

match anon_enum {
   ref.0(unsigned) => (),
   ref.1(signed) => (),
}

I used ref.0 but other ideas could be Self.0 or enum.0

I don't think the match syntax should have names but instead use a tuple style syntax, this would avoid cases where we have (vec<u32> | vec<i32>) and would need to come up with variant names for the two vecs.

Flattening anon-enums in the type system doesn't feel like a hard requirement to me, because flattening tuples is not a given.

Additionally I would say that (A | B) != (B | A) .

I would say that one restriction anon-enums should have is that all varients must be unique. Multiple variants of the same type should be the domain of named enums.

let anon_enum : (u32 | u32); // Error: This is ambiguous.
let anon_enum : (u32 | u16) = 5u8; // Error: This assignment is ambiguous
let anon_enum : (u32 | &u32) = 5u32; // This is fine because the rhs is a variant

This is the key feature that I think anon-enums should have to some degree. I don't think that all traits need to be implemented, but I would like to see as many std traits implemented as possible, specifically traits like Error, Debug, Display, Send, and Sync. There are other suggestions above and elsewhere on how this could or can't be done, but I think this is definitely important for anonymous enums.

.

Could you provide your why this particular thing is important? This seems like a very local type of thing to use (much like anonymous structs/objects in C# for example) where the details need not leak from the local scope.

1 Like

I think it would be important for error handling, which I see as a great use case for anon-enums and one of the primary reasons why I personally want to see them. Consider the following function, it can return two types of Errors.

fn readNum(file_name : &str) -> Result<i32, (std::io::Error | std::num::ParseIntError)> {
    let mut file = File::open(file_name)?; // Can produce io::Error
    let mut contents = String::new();
    file.read_to_string(&mut contents)?; // Can produce io::Error
    let output = i32::from_str_radix(&contents, 10)?; // Can produce num::ParseIntError
    Ok(output)
}

Now if I happen to care which error is being produced, I could match on them independently, and perform seperate logic depending on which error occured, however, if I am simply loging the error, having the type (std::io::Error | std::num::ParseIntError) inherit Error would allow the handling code to look something like this.

match readNum("num.txt") {
    Ok(_) => (),
    // e impl Debug because io::Error and num::ParseIntError impl Debug
    Err(e) => println!("{:?}", e),
}

without inheritance the calling code would look more like this:

match readNum("num.txt") {
    Ok(_) => (),
    Err(e) => match e {
      ref.0(e) => println!("{:?}", e),
      ref.1(e) => println!("{:?}", e), 
      // continue for larger anon-enums
   },
}

Random small idea: be more explicit.

type ReadNumErr: Error = (io::Error | ParseIntError);
fn readNum(file_name : &str) -> Result<i32, ReadNumErr>;

// or a potential inline syntax
fn readNum(file_name : &str) -> Result<i32, impl Error(io::Error | ParseIntError)>;
// or
fn readNum(file_name : &str) -> Result<i32, (io::Error | ParseIntError) impl Error>;

Explicitly say what traits are forwarded, and that triggers the codegen to go generate the forwarding impl.

This is probably way too heavy, though, and it would make sense to forward any object-safe trait.

Key Takeaway

Structural product types (anonymous enums) are really primarily useful as a kind of trait object. They're a fancy stack dyn Any where you know the potential types.

The problem with this (as well as flattening, or order independence, or anything other than the simple expansion to a nominal product type) is generics.

Ignoring syntax, consider the semantics of

fn or_0u8<T>(t: T) -> enum(T | u8) {
    if random() {
        0u8
    } else {
        t
    }
}

I can now call or_0u8(1u8) and get enum(u8, u8). Or I could pass it enum(i8, u8) and get enum(enum(i8, u8), u8).


I have three key opinions in this space:

  • "enum impl Trait" covers a significant amount of demand for anonymous enums, and should be pursued first for the stable language. (It would probably be built on top of an unstable structural enum feature.)
  • Errors are the primary use case that want something like structural enums, to serve an error handling pattern similar to checked exceptions, e.g. throws io::Error, ParseNumError. I.e. "I can run into any of these problems, you deal with it." It's somewhere between an actual new error type and using a fully dynamic error type such as anyhow::Error.
  • The error handling case would like to forward every object safe trait, so the error still can be used usefully without having to match over the actual type
    • But, it also really wants the extra wrapping to go away when the error is actually converted to a trait object, because the structural product error really is just a fancy trait object with more type information.

(The "checked exception" case is also the case that specifically would love to have both flattening and order independence ("rethrows (A|B) or throws (C|A) is the same as throws (A|B|C)), as there the only meaningful information you're trying to convey is the type of the error that happened, not where or why the error happened (at least if you're emulating actual checked exceptions).)

4 Likes

In the "type level sets" proposal you would get i8 | u8 type, so I don't see a problem here. Yes, matching on T | u8 with T = i8 | u8 would result in overlapping arms, but overlapping match arms is a well-understood possibility, so I don't think it's a problem.

Personally I am against enum impl syntax. I believe anonymous enum is an implementation detail and should not be exposed as part of signature. Instead I think we should do conversion at the return site, e.g. by using the become keyword.

Note when I say enum impl Trait I am only referring to the feature and not any specific syntax expression of said feature (which is ad hoc enums returned in an impl Trait).

This is agreeable to me. Although I would prefer that the Error trait (and other standard traits) became an auto inherited one eventually. This syntax could also be used for user generated traits. I can also see this potentially being used to have the user derive traits useful to them that the library crates they are using didn't.

This could work well as an initial implementation, that becomes redundant in some cases over time as the functionality to infer this relationship is added.

This is interesting, I didn't consider that case. I don't think having (u8 | u8) existing at runtime (or even at compile time) is the problem. The reason I brought up only allowing one variant of any given type is because I didn't want ambiguity as to which variant would be assigned. Given

fn or_0u8<T>(t: T) -> enum(T | u8) {
    if random() {
        0u8
    } else {
        t
    }
}

I think it is clear to a human reader that the first branch will assign to the second variant and the second branch will be assigned to the first variant. I think the implementation should also make this distinction. (Although maybe this generic information is lost by the time this check would take place, I don't know) Essentially

let anon_enum : (u8 | u8) = or_0u8(5u8); // This is ok
let anon_enum : (u8 | u8) = 5; // This is not

Now as far as the type system is concerned I believe (u8 | u8 | u8) != (u8 | u8) that being said the compiler could and probably should make the optimization if available.

I see your point for flattening, and agree now it is useful, although I see it most useful as an assignment. When a function would like to return (A | B | C) as opposed to (A | (B | C)). So in cases where every variant in the rhs can be coerced into a variant of the lhs, without ambiguity, then the assignment should be valid. However they would still be different types.

let anon_enum : (A | C) = c; // valid
let anon_enum2 : (A | B | C) = anon_enum; // valid

let anon_enum : (B | (A | C)) = c; // valid
let anon_enum2 : (A | B | C) = anon_enum; // valid
anon_enum == anon_enum2 // Error can't compare type (B | (A | C)) with (A | B | C)

I am still in favor of order dependence as the matching scheme I provided earler (and currently prefer) requires you to know the order and have it be deterministic. I'm open to other forms of match, however I don't really like the 'make it up by type', especially if (u8 | u8) is legal.

fn readNum(file_name : &str) -> Result<i32, (io::Error | ParseIntError)>;
// I believe this is the equivelent throws syntax
fn readNum(file_name : &str) -> i32 throws io::Error, ParseIntError;
// And then use the match syntax 
match (readNum("num.text")){
   ref.0(ioError) //ref.0 is known to be io::Error
   ref.1(parseIntError) // ref.1 is known to be parseIntError
}
// Example when matching on a (u8 | u8)
match (or_0u8(5u8)) {
   ref.0(unsignedByte) => assert!(unsignedByte == 5u8),
   ref.1(unsignedByte) =>assert!(unsignedByte == 0u8),
}