Ideas around anonymous enum types

CAD97 · August 5, 2020, 1:46pm

This is.. exactly the use case for "any error" error enums, is it not? You have a type definition in one location that enumerates the potential errors of the stack, and you can add a new error type just to the error localized enum definition and to the root level handling of the error.

If the goal of enum(...Error) is to behave like a stack dyn Error + Any, that should be explicitly stated (along with rationale for why it can't "just" be dyn Error + Any).

steffahn · August 5, 2020, 1:50pm

I wanna reiterate my point above that a value that’s Sized cannot be “like dyn” in the sense that it actually implements Any itself based on dispatch. Well... “cannot” is maybe a bit too strong, but at least it’s difficult.

CAD97 · August 5, 2020, 1:55pm

You can trivially have a Sized dyn type if you know all the variants ahead of time purely by always allocating the maximum amount of space.

Rather than roughly

struct {
    tag: uEnough,
    payload: union { A, B, C, ... },
}

you have

struct {
    vtable: *const (),
    payload: union { A, B, C, ... },
}

There is no technical restriction on a Sized type with dyn semantics if the set of types is known ahead at compile time.

steffahn · August 5, 2020, 1:57pm

No, nothing’s “trivial” about this.

My point above is that previously safe code may become unsound, at least in the case of Any, when non-specializable blanket implementations are suddenly getting extra specialized implementations.

robinm · August 5, 2020, 1:58pm

EDIT: adding context quote

No, because the "any error" enums pollutes the intermediates functions with unrelated errors.

fn raise_A() -> Result<_, ErrorA>;
fn raise_B() -> Result<_, ErrorB>;
fn raise_C() -> Result<_, ErrorC>;

fn raise_AB() -> Result<_, enum impl Error> { // no ErrorC can be raised here
    raise_A()?;
    raise_B()?;
}
fn raise_BC() -> Result<_, enum impl Error> { // no ErrorA can be raised here
    raise_B()?;
    raise_C()?;
}

fn raise_ABC() -> Result<_, enum impl Error> { // all 3 errors can be raised here
    raise_AB()?
    raise_BC()?
}

If we want to have the exact same semantic using regular enum, we need to:

create types ErrorAB, ErrorBC (and the catch-all ErrorABC)
implements Into<ErrorAB> for ErrorA and ErrorB, likewise with Into<ErrorBC> for ErrorB and ErrorC
implements Into<ErrorABC> for ErrorAB, ErrorBC, and most probably for ErrorA, ErrorB, ErrorC

If we don't create all this boilerplate:

we still need to create ErrorABC
we still need to create Into<ErrorABC> for ErrorA, ErrorB and ErrorC
doing an exaustive match of the errors raised by raise_AB() would requires to add an unreachable branch for ErrorC, and likewise with ErrorA for raise_BC().

CAD97 · August 5, 2020, 2:17pm

If it's valid to reinterpret &(sized type) as a forwarding &dyn Any downcastable to the variants, then yes, but it doesn't have to be, because (as you rightly point out) it can't (soundly) be. The sized type doesn't have to actually implement Any; I'm talking about semantics of Any, not necessarily specifically providing an implementation of Any.

My trivial Sized reinterpretation is literally just taking Box<dyn Any>'s (*const vtable, *const payload) and rearranging it to (*const vtable, payload). The payload is dynamically sized, so we take the maximum size/align and always use it. Nothing goes wrong so far.

If you want to get a downcastable dyn Any to the variants, you then take the address of the payload to turn (*const vtable, payload) into (*const vtable, *const payload) again. You then have a real &dyn Any and can use it with no problems.

The only potential stumbling block is turning &(*const vtable, payload) into &dyn Any, as there are two possible things that you could be talking about: the container itself, or the variants. This is the exact same as taking the reference to Box<T>; do you want &Box<T> as &dyn Any or &T as &dyn Any?

My trivial Sized version of dyn Any is "just" a stack Box with a set maximum size/alignment. It's nothing special, thus, "trivial."

(This does reraise the important factor that it's not possible for any anonymous enum to forward all safe object-safe traits, because a forwarding impl can conflict with a blanket impl, though.)

But if the goal is to forward all errors to the root where they're handled, does this matter?

Also, I'm fairly sure that in most reasonable enum impl Trait proposals, raise_ABC would not be returning an enum(A, B, C), but an enum(<raise_AB as Fn>::Output::E, <raise_BC as Fn>::Output::E). The enum impl Trait feature is just as opaque as impl Trait is. You can only use an impl Trait through the trait impl, whether that impl Trait is backed by a single type, an enum, a type union, or whatever.

robinm · August 5, 2020, 2:44pm

If the goal was only to forward all errors to the root, it would not matter. This was a question I had, and cannot find who answered it (lazy loading makes ctrl+f unusable ).

However one may want to:

match partially on the types returned by raise_AB() and raise_BC(). Adding and arm when match the catch-all enum for respectively ErrorC and ErrorA would not even raise a warning, even if that case is unreachable (because the catch-all enum says that it may in the future).
match exhaustively on all the types returned by raise_AB() and raise_BC(). Doing so with a catch-all enum would require to handle the ErrorC and ErrorA respectively, even if once again they are unreachable.

That's absolutely true, but at the same time, it's trivial to coherce it into enum (A, B, C) before matching (and thus the usage is homomorphic to enum (A, B, C)).

Jon-Davis · August 5, 2020, 3:26pm

I think my confusion stems from the ability to coerce an enum impl Trait into a concrete enum. I figured if you can take an opaque enum and make it transparent through coercion, you could also take an opaque enum and match on it as if it were transparent.

That's a good point, the enum impl trait feature is to enable you to propagate up to the root. If you then need to specify a coercion at the root to the full concrete types, are you actually saving anything.

let output = Result<usize, enum(io::Error, ParseIntError, SqlError, HttpError)> = fail();
match output {}

vs

#[derive(Debug, Error)]
enum FailError {
    IoError(#[from] io::Error),
    ParseIntError(#[from] ParseIntError),
    SqlError(#[from] SqlError),
    HttpError(#[from] HttpError),
}
let output : FailError = fail()
match output() {}

In the beginning of the work shop the code looked very nice as coercion was implicit and magic was the name of the feature, but as more and more type information needed to be included and the magic stripped away we are left with needing to write out most things explicitly. I imagine that people would get sick of writing out all the as enum(io::Error, ParseIntError, SqlError, HttpError) and fn() -> enum(io::Error, ParseIntError, SqlError, HttpError) and would just start using type aliases FailError = enum(io::Error, ParseIntError, SqlError, HttpError). And at that point, just use an enum with thiserror.

If the goal is to essentially generate a stack allocated trait object, the main benefit is that enums would be stack allocated, which is cheaper, and wouldn't use a vtable (which is arguably negligible). If allocation cost is the primary concern, than using a custom allocator when the Alloc is stable should handle that issue. This would probably even result in better performance as the Err return would only be a reference.

If the goal is to be the primary use case for error handling, but still include no magic, or negate the magic by requiring a full concrete coercion, then it is actually more verbose to write out anonymous enums over and over again, than it would be to write your own enum.

I actually think that the primary use case for anonymous enums as work shopped wouldn't be propagating errors throughout a crate, I think the use cases for anonymous enums would be

When a function can return more than one type, but it would be just slightly inconvenient to make a whole enum for one functions.
Returning multiple anonymous types such as those created with closures, iterators, futures, ect.

They still have value, I just wonder if long chains of enum impl Errors would actually be written. If you are using the same error and so many places, at that point it might deserve to become an actual type on it's own.

Jon-Davis · August 5, 2020, 5:50pm

I think there are ergonomic gains to still be found

let mut x = 5u32 as enum(u32, bool);
x = false as enum;

With type inference and maybe a syntax more in line with await, the above expression could be

let mut x = 5u32.enum;
x = false.enum;

Function headers and other places where the type is required will still be required, but in a local let statement the type could be inferred. This syntax can also be extended if a specific variant is desired.

let mut x = 5u32.enum::0;
x = -5u32.enum::1;

Additionally I no longer think the path of matching on a concrete type given an enum impl trait is the right one to take. The type is opaque and should remain that way. Additionally I see enum impl trait more useful in creating anonymous enums for anonymous types, rather than for error handling. You could still use it for error handling but, really only for the catch all case.

fn random_function() -> enum impl Fn() -> usize;

That said I like @robinm's idea of specifying some concrete types in the enum impl. This would allow users to add new errors without having to propagate the changes, unless they specifically wanted to match on that case.

fn fail() -> enum(io::Error, ParseIntError) impl Error;
// I kind of like the varadic syntax a bit better to convey this but im biased.
fn fail() -> enum(io::Error, ParseIntError, impl Error...);

robinm · August 5, 2020, 6:29pm

Just a quick remark, if enum impl Trait is completely opaque and that anonymous enum can safely implement any safe trait, then there are no differences between fn foo() -> enum impl Trait and fn foo() -> impl Trait (and I prefer the latter).

CAD97 · August 5, 2020, 6:38pm

That is the common intent, for -> enum impl Trait to be equivalent to -> impl Trait to the consumer of the function. The only reason "enum impl Trait" is a separate thing from impl Trait is so that impl Trait can require type unification, so that extra wrapping enum layers aren't accidentally added.

The "enum impl Trait feature" is an opaque type, just like impl Trait is. We can have a similar feature that is downcastable to some set of known types, but it should be a distinct feature from the "enum impl Trait feature."

traviscross · August 5, 2020, 10:55pm

fn make_frob(x: u32) -> impl Fn(u32) -> u32 {
  if x % 2 == 0 {
    |y| y / x
  } else {
    |y| y / (x + 1)
  }
}

If the code above could be made to work, it would be a huge win for Rust.

When people are learning the language, they universally expect that code to work and are surprised when it doesn't. When teaching, I'd rather have to explain the caveat about the minor overhead due to the anonymous enum rather than having to explain the compiler error and how to work around it, and why things are so much different once you add the conditional branch. The potential problem of accidentally adding an anonymous enum seems minor in comparison. It's almost just an implementation detail.

Besides, if the mental model is that the anonymous enum is an opaque type that implements the trait, and the function body is returning that anonymous enum, then it would be terribly surprising if impl Trait didn't cover that. We wouldn't want to have to explain that, "a return type of impl Trait means that the function returns some type that implements the trait, unless that type is an anonymous enum, in which case there's a special syntax for that."

atagunov · August 6, 2020, 11:06pm

Replying to posts around #70: type level sets (union type) vs anonymous enums (sum type)

If an fn .. -> impl Trait actually returns (A|B) it appears highly undesirable for a match to see through impl Trait and distinguish A from B in this result:

fn f() -> impl Trait { r : (A|B) = ...; r }
fn g<T>(T t) -> (A|T) { .. }
..
match g(f()) {...
    /* we shouldn't be able to peek inside value returned by f()
       least  its implementation details leak */
}

This appears more consistent with anonymous enums than type level sets proposal because a naive/efficient implementation of type sets proposal looking at TypeId or another form of identity of the underlying variants might be able to discern that f returns (A|B).

Ixrec · August 7, 2020, 10:20am

This seems like a good point to do yet another round of "enum impl Trait recap" for those unfamiliar with the old threads.

In the past, it was consistently universally agreed that we did not want just regular impl Trait syntax to autogenerate enums in order to "make stuff compile"/"do what I mean", to the point that I'm not sure we ever bothered to argue for it. In other words, everyone agreed that we should not allow this to compile as-is:

and that we want to add some kind of explicit syntactic marker in there to say "I want an autogenerated enum", ideally just a single enum keyword. The usual sticking point back then was on where the marker should go, since -> enum impl Trait function signatures have the obvious problem that the enum-ness is irrelevant to the caller and signatures are supposed to be about the caller/callee contract, but the only other option anyone could think of was to mark every single return site e.g. enum(|y| y / x) and that O(n) verbosity seems to completely miss the point of a syntax sugar feature like this. I always felt enum impl Trait in the signature was the lesser evil.

But evidently the situation has changed and it's now up for debate whether any syntax at all should be required to opt-in to enum generation. So let's debate it:

My current position is a strong yes, we do need some extra syntax to opt-in to this, because in a language like Rust I feel it's very important for users to understand that those |y| y / x and |y| y / (x + 1) expressions have two different types and the function's going to be returning an autogenerated enum at runtime to distinguish between the two with all the extra runtime branches and indirections that implies. I believe trying to hide this / not forcing novices to understand this would be a mistake very similar to the "bare trait" syntax which we eventually changed back to dyn Trait. Also, it should be fairly easy for the compiler to detect "returning impl Trait with multiple types" and say "did you want enum impl Trait?" so any novice who expects this to "just work" will get what they wanted a few seconds later anyway.

atagunov · August 7, 2020, 4:14pm

Suppose non-generic code

needs to construct a value and pass it to generic code
of an "ambiguous" enum type
the value needs to be deeply nested in a large object

eaglgenes101's RFC (draft?) suggests

let original = (f64 | f64)::1(0.0_f64)

which I find - along with the whole RFC - surprisingly concise, rigorous and easy to understand

lordan · August 7, 2020, 6:39pm

I also very strongly agree that the conversion to an enum needs to be explicit.

Reading these threads I've also come around to thinking that this doesn't need to (perhaps even shouldn't) be part of the signature. Some suggestions (hopefully new ones):

fn make_frob(x: u32) -> impl Fn(u32) -> u32 {
  type result = enum impl Fn(u32) -> u32;
  if x % 2 == 0 {
    |y| y / x as result   // or without alias: |y| y / x as enum impl Fn(u32) -> u32
  } else {
    |y| y / (x + 1) as _  // shorthand for later instances, rely on unification
  }
}

Or perhaps even more concise, assuming as enum will be unified with the impl Trait from the signature to result in an enum impl Trait, which then later return values can further reduce:

fn make_frob(x: u32) -> impl Fn(u32) -> u32 {
  if x % 2 == 0 {
    |y| y / x as enum 
  } else {
    |y| y / (x + 1) as _
  }
}

Or some syntax that allows us to further constrain (or "pre-construct") the return value:

fn make_frob(x: u32) -> impl Fn(u32) -> u32 {
  let return: enum impl Fn(u32) -> u32;   
  // let return: enum impl _;    // if we can assume the trait(s) from the signature
  // let return: enum;           // assuming the unifying logic from the 2. example
  if x % 2 == 0 {
    |y| y / x                    // assign to "existing" return enum
  } else {
    |y| y / (x + 1)
  }
}

traviscross · August 8, 2020, 3:22am

Thanks @lxrec for the thoughts and the recap on old threads.

The problem with putting this in the type is that it just doesn't make any sense as a part of the type. If the anonymous enum in fact implements the trait, and is a value that can be passed around as something that implements the trait, then I just don't see how we could ever justify to ourselves that impl Trait wouldn't work with the anonymous enum.

In particular, it seems too horrible that the enum keyword would logically have to move around. E.g.:

// The anonymous enum is returned directly
fn make_frob(x: u32) -> enum impl Fn(u32) -> u32 {
  if x % 2 == 0 { |y| y/x } else { |y| y/(x+1) }
}

// The anonymous enum is saved to a variable first,
// after we support `impl Trait` in let
fn make_frob(x: u32) -> impl Fn(u32) -> u32 {
                     // ^-- is `enum` required, optional,
                     // or prohibited here?
  let ret: enum impl Fn(u32) -> u32 =
    if x % 2 == 0 { |y| y/x } else { |y| y/(x+1) };
  ret
}

// What about here?
fn make_frob(x: u32) -> impl Fn(u32) -> u32 {
  let f = || {
    if x % 2 == 0 { |y| y/x } else { |y| y/(x+1) }
  };
  f()
}

If we were going to add extra syntax for it, it seems much more like a kind of constructor, or a kind of block. E.g.:

fn make_frob(x: u32) -> impl Fn(u32) -> u32 {
  enum {
    if x % 2 == 0 {
      |y| y / x
    } else {
      |y| y / (x + 1)
    }
  }
}

I.e., an unnamed enum block returns an anonymous enum. That -- or something like it that operates on a block (e.g. an enum! macro) -- makes sense at least. If it were an enum! macro, perhaps it could be lowered in such a way that each branch of the conditional is cast into the anonymous enum in a way that can't necessarily be represented in the surface syntax.

Jon-Davis · August 8, 2020, 4:25am

I'm leaning towards using impl Trait but coercing the return types to an enum. Seems to be the most consistent with the current anonymous enum proposal as well as staying inline with impl Trait only returning one type.

fn make_frob(x: u32) -> impl Fn(u32) -> u32 {
  if x % 2 == 0 {
    |y| y / x as enum 
  } else {
    |y| y / (x + 1) as enum
  }
}

traviscross · August 8, 2020, 5:18am

Regarding the question of whether we can do without any extra syntax at all, let's think about this in terms of our different kinds of users.

People learning or teaching Rust

In a comment above, I argued on behalf of novice users, and those that have to teach them. Clearly a Rust user at some point needs to learn the deep truths about how things like closures, iterators, and futures are implemented and treated in the type system. But I don't see the value in beating people down with compiler errors early on in the journey when the code could be made to work in a principled and straightforward manner with truly nominal and unavoidable overhead (more on that below). They can learn that Rust closures are hard at any time. No need to force it on people early.

Consider that Rust compiles this code today:

fn make_frob(x: u32) -> impl Fn(u32) -> u32 {
  if x % 2 == 0 {
    |y| y / 2
  } else {
    |y| y * 2
  }
}

But obviously as soon as one of the (syntactic) closures captures a value, as in the earlier make_frobs, it will break. The reasons for this are subtle, and I think everyone learning Rust should understand those things eventually. But it would be OK if the language let us teach those things later.

The average user

In terms of the effect this would have on the average user, I'm reminded of the work on non-lexical lifetimes (NLL) and improved match ergonomics.

In each of these cases, it could be (and was) argued that forcing people to be more explicit is better. But we instead decided to make large swaths of reasonable-looking code "just work." It seems that has been a success.

Regarding the NLL work, I recall specifically someone on the lang team pointing out, and I'm paraphrasing from memory, that "rejecting correct programs is not a desirable feature."

It seems to me the desire for extra syntax is born from a concern that the average user will not understand what's going on and will accidentally add some extra overhead. But if it's not too much to ask that the average user understand that, e.g., every closure has a different anonymous type (except things that look like closures but are not because they don't capture), then it's hard for me to see how it's too much to ask for that user to know that returning a different type from a branch will result in an anonymous enum.

Besides, what's the user supposed to do instead? If the user really wants to return two different types, then it's difficult to see what better option the user has.

People coming from auto-boxing languages

Anyone coming from Python, Javascript, Lua, Lisp, Haskell, etc. really wants make_frob to just work the first and every time they try it. If a language like Rust can make the code these people want to write work without boxing, that's the good kind of magic.

The expert user

Experts will get used to whatever we do, and in particular, will know immediately and intuitively that returning a different type from a branch will result in an anonymous enum, just like these experts know immediately and intuitively right now that such code will result in a compiler error. Personally, I'd be happy that it did exactly what I wanted there with less typing.

Critically, this is not to say that all forms of explicitness are bad. Moving to dyn Trait was a truly good change for a whole host of reasons that mostly don't apply here.

People concerned about churn in the language

Some people are concerned about the rate or perceived rate of change of the language. If we add new syntax, then all users need to learn it, even if they don't use it themselves, as they will surely run across it in the code of others. Conversely, if we just make code that looks like it should work actually work without new syntax, then someone unfamiliar with anonymous enums may be surprised that a particular block of code works, but they're unlikely to be confused about what that code means and what result it produces.

As happened with NLL, I expect most users surprised that a particular block of code compiles would think, "huh, that's neat, I guess Rust finally decided to accept the obvious code there."

People concerned about zero-cost abstractions

Stroustrup's rule for zero-cost abstractions is:

What you don't use, you don't pay for. And further: What you do use, you couldn't hand code any better.

Anonymous enums without extra syntax meet this standard. No code that compiles today incurs any additional overhead due to these enums. And if you're trying to return, e.g., two or more different types of closures, or iterators, or futures, then there is no way you could implement it any better than the anonymous enum.

In summary, I'm a big fan of being explicit in general, but if we can make code that looks like it should work actually work in a principled manner and in a way that doesn't add avoidable overhead, and if it's better for all or most of our kinds of users, then it seems that we should at least strongly consider just doing that.

P.S. Any syntax we don't add is syntax that we don't have to bikeshed.

Ixrec · August 8, 2020, 9:31am

One issue with as enum I completely forgot to mention is that I’m not sure how it’s supposed to work with ?, which obviously matters for the error use case. fallible_op()? as enum would mean coercing the success value, not the error value.

Similarly, you often want the return type to be Result<T, enum impl Error> in functions with lots of ?s, not enum impl Try because the caller's gonna want that T, and I'm not sure how the former could work well with as enum markers on return sites.

Topic		Replies	Views
pre-RFC: anonymous enums language design	13	5558	March 25, 2019
Pre-RFC: Anonymous variant types language design	93	6188	March 25, 2019
[Pre-RFC] Anonymous enum language design	10	3438	March 25, 2019
Concept RFC: Tuple Enums	32	3374	November 12, 2020
[PreRFC] enum-variant-types language design	17	1984	September 14, 2023