pre-RFC: anonymous enums


#1

In Rust we have a unique structure type for each number of elements: the tuples.

But we don’t have a corresponding for enums.

Let me suppose we do have anonymous enums that play the similar role of tuples to structs. This will allow a pattern match return different types on different branches:

let v:Option<i32> = ...
...
let result = match v {
     Some(v) => v+2,
     None => "Error!"
} //type of result: Enum{i32, &'static str}
...
match result {
    #0(v) => println!("{}", v), //v: i32
    #1(v) => println!("{}", v) //v: &'static str
}

This will make the type system complete, and allow some legal programs that require explicit definition of new types.


Representing closed trait objects as enums
#2

There has been a lot of previous discussion of this, https://github.com/rust-lang/rfcs/issues/294 and https://github.com/rust-lang/rfcs/pull/1154 probably cover the majority of it.

I would be super keen on someone actually getting a workable RFC for this that combines with impl Trait well to replace futures::future::Either, I don’t personally have much to add to the discussion though.


#3

Why futures needs to implement their own either instead of reusing the either crate? :confused:

Rust could transform

fn run(x: bool) -> impl Display {
    if x { "string" } else { 1234 }
}

into

enum __Anonymous$Display$run {
    Branch1(&'static str),
    Branch2(i32),
}
impl Display for __Anonymous$Display$run {
    fn fmt(&self, __arg1: &mut Formatter) -> Result {
        match self {
            __Anonymous$Display$run::Branch1(a) => a.fmt(__arg1),
            __Anonymous$Display$run::Branch2(a) => a.fmt(__arg1),
        }
    }
}
fn run(x: bool) -> __Anonymous$Display$run {
    if x { 
        __Anonymous$Display$run::Branch1("string")
    } else {
        __Anonymous$Display$run::Branch2(1234)
    }
}

but I don’t consider this a viable solution, since every call to Display::fmt will need to do check the tag of the enum, making it not a zero-cost abstraction. That’s even worse for Iterators since every call to .next() will involve a switch table behind the scene.


#4

I believe the “official” answer to that is https://github.com/alexcrichton/futures-rs/pull/271#issuecomment-264313963 “I’d prefer to keep the dependencies slim for now”

I’ve suggested more or less this on a few other threads, but with an explicit syntactic opt-in like “enum impl Trait” instead of just “impl Trait”. I completely agree that it’s not a complete solution to “anonymous enums” in general and if we ever did this it’d need some kind of opt-in since plain impl Trait generating an enum would, as you say, have non-zero overhead.


#5

I don’t think it’s fair to decry this as a non-zero-cost abstraction. If you actually need to return one of a closed set of multiple types like that, then that is precisely the overhead you will always have to pay, whether or not you’re forced to write a bunch of boilerplate to accomplish it. For comparison, trait objects are in a similar sense not “zero-cost” compared to doing something else entirely, but sometimes they’re what you need, and so we have them.

It seems implausible to me that there is a serious danger of people significantly degrading the performance of their code by inadvertently returning enums.

In cases where you could otherwise match on the enum and make a great many method calls, then perhaps there is a small (albeit branch prediction friendly) cost here (although isn’t LLVM supposed to lift branches like that out of loops?). However, there are common cases where you have no choice. In futures you’re passing your anonymous enum into existing generic code, and the only alternative is boxing, which strikes me as a much higher price to pay. Even a hand-written Future impl would contain branching.

As discussed in IRC, it seems unfortunate to introduce type signature syntax which is not actually relevant to the externally visible semantics of the interface being defined.


#6

:thinking: Looks like LLVM does lift that out of the loop

Example code
extern crate either;
use either::Either;

#[inline(never)]
pub fn consume(a: u64) {
    println!("{}", a);
}

#[inline(never)]
pub fn do_iteration<I: Iterator<Item=u64>>(it: I) {
    for i in it {
        consume(i);
    }
}

struct A;
struct B;

impl Iterator for A {
    type Item = u64;
    #[inline(never)]
    fn next(&mut self) -> Option<u64> {
        Some(1)
    }
}

impl Iterator for B {
    type Item = u64;
    #[inline(never)]
    fn next(&mut self) -> Option<u64> {
        Some(2)
    }
}

fn main() {
    do_iteration(Either::Left::<A, B>(A));
    do_iteration(Either::Right::<A, B>(B));
}
Generated assembly for `do_iteration`
	.section	.text._ZN10playground12do_iteration17hd1e6f78936280d2cE,"ax",@progbits
	.p2align	4, 0x90
	.type	_ZN10playground12do_iteration17hd1e6f78936280d2cE,@function
_ZN10playground12do_iteration17hd1e6f78936280d2cE:
	.cfi_startproc
	pushq	%rbx
.Lcfi1:
	.cfi_def_cfa_offset 16
	subq	$16, %rsp
.Lcfi2:
	.cfi_def_cfa_offset 32
.Lcfi3:
	.cfi_offset %rbx, -16
	testb	%dil, %dil
	je	.LBB1_4
	movq	%rsp, %rdi
	callq	_ZN64_$LT$playground..B$u20$as$u20$core..iter..iterator..Iterator$GT$4next17h54dc687e4753bf0dE
	cmpq	$0, (%rsp)
	je	.LBB1_7
	movq	%rsp, %rbx
	.p2align	4, 0x90
.LBB1_3:
	movq	8(%rsp), %rdi
	callq	_ZN10playground7consume17hcc0db80a680e044dE
	movq	%rbx, %rdi
	callq	_ZN64_$LT$playground..B$u20$as$u20$core..iter..iterator..Iterator$GT$4next17h54dc687e4753bf0dE
	cmpq	$0, (%rsp)
	jne	.LBB1_3
	jmp	.LBB1_7
.LBB1_4:
	movq	%rsp, %rdi
	callq	_ZN64_$LT$playground..A$u20$as$u20$core..iter..iterator..Iterator$GT$4next17h5854e5cb157207c3E
	cmpq	$0, (%rsp)
	je	.LBB1_7
	movq	%rsp, %rbx
	.p2align	4, 0x90
.LBB1_6:
	movq	8(%rsp), %rdi
	callq	_ZN10playground7consume17hcc0db80a680e044dE
	movq	%rbx, %rdi
	callq	_ZN64_$LT$playground..A$u20$as$u20$core..iter..iterator..Iterator$GT$4next17h5854e5cb157207c3E
	cmpq	$0, (%rsp)
	jne	.LBB1_6
.LBB1_7:
	addq	$16, %rsp
	popq	%rbx
	retq
.Lfunc_end1:
	.size	_ZN10playground12do_iteration17hd1e6f78936280d2cE, .Lfunc_end1-_ZN10playground12do_iteration17hd1e6f78936280d2cE
	.cfi_endproc
Decompiled back to C pseudo-code

(Don’t care about the details, only the control-flow structure is important here)

int __fastcall _1::do_iteration::hbe3238d1e588a15d(char a1)
{
  int result; // eax@2
  __int64 v2; // [sp+8h] [bp-18h]@2
  __int64 v3; // [sp+10h] [bp-10h]@3

  if ( a1 )
  {
    for ( result = _$LT$1..B$u20$as$u20$core..iter..iterator..Iterator$GT$::next::h1867f48de4bd7158(&v2);
          v2;
          result = _$LT$1..B$u20$as$u20$core..iter..iterator..Iterator$GT$::next::h1867f48de4bd7158(&v2) )
    {
      _1::consume::h9556e51215ea496c(v3);
    }
  }
  else
  {
    for ( result = _$LT$1..A$u20$as$u20$core..iter..iterator..Iterator$GT$::next::hc02abc75369d60a1(&v2);
          v2;
          result = _$LT$1..A$u20$as$u20$core..iter..iterator..Iterator$GT$::next::hc02abc75369d60a1(&v2) )
    {
      _1::consume::h9556e51215ea496c(v3);
    }
  }
  return result;
}

I still think making -> impl Trait automatically create an enum is not a good option, because this would make accidentally returning values of different type a silent logic error.

If type signature syntax is not desirable, what about an attribute like

#[returns_anonymous_enum]
fn foo() -> impl Trait { ... }

#7

I don’t think this should be restricted to return position; it could be quite useful inside a method as well.

Imagine something like this, syntax TBD of course:

let iter = enum if rtl { v.iter().rev() } else { v.iter() };
iter.for_each(|item| {
   ... lots of complex code ...
});

Making it a property of the if or the match keeps it an implementation detail of the method, and is compatible with returning it as an impl Trait.


#8

I’m not convinced that this is an easy mistake to make, or even that that it’s a serious error if someone does manage to make it.

That said, I find this notion for making it explicit much more agreeable than adding magic to the type signature. It should be noted that branches aren’t the only context in which a type like this could come up: you might want an enum-flavored variable that you potentially initialize (or assign to) with multiple types. The if/match case is of more interest to me, though


#9

Since “anonymous-enumization” is a property of the return type, if we want to generalize it to expression level, it makes more sense to put the annotation on the type:

let a = if foo { bar() } else { baz() }: enum impl Trait;

let b = match foo {
    X => bar(),
    Y => baz(),
}: enum impl Trait;

let c = loop { 
    if foo { break bar() }
    if !foo { break baz() } 
}: enum impl Trait;

let d = [bar(), baz()]: [enum impl Trait; 2];

Or make enum itself an expression, like

let a = enumize!(if foo { bar() } else { baz() } => impl Trait);
let d = enumize!([bar(), baz()] => [impl Trait; 2]);

We shouldn’t be able to omit the Trait in the enum expression, unless either:

  • type inference can work out for_each is a method of Iterator<Item=T> which is a common trait impl of the two types, or

  • the generated enum just impl every common trait and method shared by the two types, including those irrelevant to the current expression (e.g. Default)


#10

So, about enum-impl-Trait. Suppose we have

trait Heavy {
    fn new() -> Self;
    fn check(&self);
    fn collide(&self, other: &Self);
    fn sink(self: Box<Self>);
}

struct Ship([u8; 500]);
struct Iceberg([u64; 90000]);

impl Heavy for Ship { ... }
impl Heavy for Iceberg { ... }

fn make_a_heavy(thing: &str) -> enum impl Heavy {   // <---
    match thing {
        "ship" => Ship::new(),
        "iceberg" => Iceberg::new(),
        _ => panic!("invalid input"),
    }
}

let ship = Box::new(make_a_heavy("ship"));
ship.check();
ship.collide(&make_a_heavy("iceberg"));
ship.sink();

Now the problem is, how do we properly implement that anonymous type enum impl Heavy? Consider my previous suggestion,

enum __Anonymous$Heavy$make_a_heavy {
    Branch1(Ship),
    Branch2(Iceberg),
}
impl Heavy for __Anonymous$Heavy$make_a_heavy {
    fn new() -> Self {
        unimplemented!()
    }
    fn check(&self) {
        match *self {
            Self::Branch1(ref a) => a.check(),
            Self::Branch2(ref a) => a.check(),
        }
    }
    fn sink(self: Box<Self>) {  // ???????
        match self {
            box Self::Branch1(a) => (box a).transfer(),
            box Self::Branch2(a) => (box a).transfer(),
        }
    }
    fn collide(&self, other: &Self) { // ????????????????
        match (*self, *other) {
            (Self::Branch1(ref a), Self::Branch1(ref b)) => a.collide(b),
            (Self::Branch2(ref a), Self::Branch2(ref b)) => a.collide(b),
            _ => panic!("oh no I did not expect that"),
        }
    }
}
new()

It should not be possible call associated methods without knowing the type itself, so it should be fine to implement is as unreachable.

check(&self)

These just forward to their concrete type’s respective methods. Similar for methods that take &mut self and self.

sink(self: Box<Self>)

Now this one is interesting. Unlike the references and Self itself, you cannot zero-cost transform a Box<A> into a Box<B>. You have to allocate a new box, and perform an possibly expensive memcpy.

Furthermore, here we used a box-pattern here. If we do implement https://github.com/rust-lang/rust/issues/44874, how are we going to deconstruct the Self out of a Mutex<Self>? Rc<Self>? my_crate::CrazySmartPointer<Self>?

collide(&self, other: &Self)

This one is plain impossible to forward. The definition of the trait only prepared how to collide a Ship with a Ship and an Iceberg with an Iceberg, so when we bring two enum impl Heavy of different types together, we could only respond by panicking. This is very undesirable for compiler-generated code!


If we implement enum-impl-Trait using enums, it means we have to also introduce a concept of “enum-impl-Trait-safe methods” (just like object-safe methods), which:

  • only take self, &self or &mut self as the first argument, and
  • does not refer to Self in any other arguments, and
  • returns either Self, or a type that does not involve Self.

Or do we have other ideas to get around these restrictions?


#11

The Box<Self> and other: Self cases remind me a lot of the issues with delegation syntax. In fact, I think it’s exactly the same problem, and we’d probably want the “delegation” done by enum impl Trait to work similarly to delegate * to Heavy if only for consistency’s sake.


#12

I think, for symmetry with tuples, the “disjoins” RFC still had the right idea. Though I’d add the ability to refer to variants numerically (again, for symmetry with numeric indexing on tuples).

To recap, at the risk of beating a dead horse.

Types:

  • 0-variants: ! (the never type is already a zero-variant anonymous enum)
  • 1-variant: (A|)
  • 2-variants: (A|B) or (A|B|) (optional trailing |)
  • etc.

Expressions/patterns (positional form):

  • 0-variants: N/A
  • 1-variant: (x|)
  • 2-variants: (x|!) or (!|y), alternatively written (x|!|) or (!|y|)
  • etc.

Expressions/patterns (numeric form):

  • 0-variants: N/A
  • 1-variant: <(A|)>::0(x)
  • 2-variants: <(A|B)>::0(x) or <(A|B)>::1(y) (or with optional trailing |)
  • etc.

This would make the numeric syntax cumbersome, which would defeat the point, except that you could assign the type to a type alias if you need to keep referring to it:

type MyDisjoin = (A|B|C);
match blah {
    MyDisjoin::0(x) => ...,
    MyDisjoin::1(y) => ...,
    MyDisjion::2(z) => ...,
}

#13

Let’s also not forget frunk's Coproduct feature =)